The Journal of 
ve erimental Education 


e report of scientific investigations relating to child development, 
fe , a culum, learning, teaching, supervision, measurements, 


| een aie statistics, and experimental techniques. 
oe MARCH, 1942 Number 3 








MEASUREMENTS, STATISTICS, AND METHODS OF 
EXPERIMENTAL RESEARCH 


CONTENTS 


PAGE 
Comparison of Subjective and Objective Judgments of Children’s Drawings: 
Betty Lark—Horovitz 


Steps for the Application of the Johnson—Neyman Technique—A Sample 
Analysis: Robert H. Koenker 


The Applicability of the. Spearman—Brown Formula to Teachers’ Marks in 
Colorado State College of Education: Loraine Bruce 


The Relation of Primary Mental Abilities to Scholastic Success in Professional 
Schools: Dewey B. Siuit and Harry H. Hudson 


Relative Difficulty of Test Items of the Revised Stanford—Binet: An Analysis 
of Records From a Low Intelligence Group: Arthur L. Rautman 





$5.00 A YEAR PUBLISHED QUARTERLY $1.50 A Copy 


Professor of U of 
Edited and Published by A. 8S. Barr, oo Education, University of Wisconsin, 
Entered as sncond-class matter October 17, 1988 at the post office at Madison, Wisconsin, 








EDITORIAL BOARD 
A. S. Barr, Chairman, Professor of Education, University of Wisconsin, Madison, Wis. 


Carter V. Good, Professor of a Univer- 

sity of Cincinnati, Cincinnati, Ohio 
prs prnnnalondigies => eel September. 

J. Wayne Wrightstone, Assistant Director, Bureau 
of Reference, Research and Board of 
Education of the City of New York, 110 Liv- 
ingston Street, Brooklyn, New York. Editori- 
ally responsible for materials on curriculum 
construction, published each June. 


Edward E. Cureton, Senior Educational Statisti- 
cian, U. S. Office of Education, W: 
D. C. Edito le 


. pul 
Arthur T. Jersild, ‘Professor of Education, Teach- 
ers Co Columbia University, New York 


City. Editorially responsible for. materials on 
child welfare, guidance, and development, pub- 
lished each December. 


CONTRIBUTING EDITORS 


Gilbert L. Betts, Supervisor of Graduate Research 
in Education, Colorado State College, Fort 
Collins, Colorado, 

William A. Brownell, Professor of Educational 
Psychology, Duke University, Durham, North 
Carolina. 

Leo J. Brueckner, Professor of Education, Univer- 
sity of Minnesota, Minneapolis, Minnesota. 

Oscar K. Buros, Associate Professor of Education, 
Rutgers University, New Brunswick, New 
Jersey. 

Otis W. Caldwell, General Secretary, The Amer- 
ican Association for the Advancement of Sci- 
ence, Boyce Thompson Institute for Plant 
Research, Yonkers, New York. 

Leslie L. Chisholm, Associate Professor of Educa- 
tion, State College of Washington, Pullman, 
Washington. 

Herbert S. Conrad, Associate Professor of Psy- 
chology, College of Agriculture, and Research 
Associate, Institute of Child Welfare, Univer- 
sity of California, Berkeley, California. 

Stephen M. Corey, Professor of Educational Psy- 
chology, University of Chicago, Chicago, Illinois. 

Robert A. Davis, Professor of Education, Director 
of Bureau of Educationa! Research, University 
of Colorado, Boulder, Colorado. 

Harl R. Douglass, Director of College of Educa- 
“_ University of Colorado, Boulder, Colo- 
rado. 

Jack W. Dunlap, Associate Professor of Educa- 
tional Psychology, University of Rochester, 
Rochester, New York. 

Harold A. Edgerton, Assistant Professor of Psy- 
chology, Ohio State University, Columbus, Ohio. 

Alvin C. Eurich, Professor of Education, Stanford 
University, Stanford University, California. 

John C. Flanagan, Assistant Chief, Research Sec- 
tion, Medical Division, Office, Chief of the Air 
Corps, War Department, Washington, D. C. 

Kai Jensen, Associate Professor of Education, 
University of Wisconsin, Madison, Wisconsin. 

Harold E. Jones, Professor of Psychology, Di- 
rector of Research, Institute of Child Welfare, 
University of California, Berkeley, California. 

Noel Keys, Professor of Education, University of 
California, Berkeley, California. 

Edward A. Lincoln, Consulting Psychologist, 
Halifax, Massachusetts. 

T. E. Newland, Pennsylvania State Department 
of Education, Chief of Special Education, Har- 
risburg, Pennsylvania, 

C. W. Odell, Associate Professor of Education, 
University of Illinois, Urbana, Illinois. 


Willard C. Olson, Professor of Education, Director 
of Research in Child Development, University 
< Michigan, Ann Arbor. Michigan. 

W. E. Peik, Dean and Professor of Education, Uni- 
versity of Minncsota, Minneapolis, Minnesota. 

S. L. Pressey, Professor of Educationa! Psychol- 
ogy, Ohio State University, Columbus, Ohio. 

Clarence E, , Associate Professor of Edu- 
cation, University of Wisconsin, Madison, Wis- 
consin. 

H. H. Remmers, Director, Division of Educational 
Reference, Professor of Education and Psy- 
chology, Purdue University, Lafayette, Indiana. 

Henry D. Rinsland, Professor of Education and 
Director of Educational Research, The Univer- 
sity of Oklahoma, Norman, Oklahoma. 

Robert T. Rock, Jr., Professor ‘of Psychology, 
Head of Dept. of Psychology, Graduate School, 
Fordham University, New York City. 

G. M. Ruch, Chief of Research and Statistical 
a U. S. Office of Education, Washington, 

P. J. Rulon, Assistant Professor of Education, 
Graduate School of Education, Harvard Uni- 
versity, Cambridge, Massachusetts. 

Douglas E. Scates, Associate Professor of Educa- 
tion, Duke University, Durham, North Carolina. 

David Segel, Educational Consultant, Specialist in 
Tests and Measurements, Federal Security 
Agency, U. S. Office of Education, Washington, 
D.C 


Paul W. Terry, Professor of Educational Psy- 
chology, University of Alabama, University, 
Alabama. 

Helen Thompson, Clinic of Child Development, 
Research Associate, Yale University, New 
Haven, Connecticut. 

Robert L. Thorndike, Associate Professor of Edu- 
cation, Teachers College, Columbia University, 
New York City. 

Herbert A. Toops, Professor of Psychology, Ohio 
State University, Columbus, Ohio. 

T. L. Torgerson, Professor of Education, Univer- 
sity of Wisconsin. Madison, Wisconsin. 

Helen M. Walker, Professor of Educa Teachers 
College, Columbia University, New York City. 

Beth L. Wellman, Professor of Psychology, Child 
Welfare Research Station, State University of 
Iowa, Iowa City, Iowa. 

Guy M. Wilson, Professor of Education, Boston 
University, Boston, Massachusetts. 

Paul A, Witty, Professor of Education, Director 
of Psycho-Educationa] Clinic, School of Educa- 
oom Northwestern University, Evanston. 
Illinois. 

Ernest R, Wood, Professor of Education, New 
York University, New York City. 














Journal of Experimental Education 








Volume X 


MARCH, 1942 


Number 3 








COMPARISON OF SUBJECTIVE AND OBJECTIVE JUDGMENTS 
OF CHILDREN’S DRAWINGS 
Betty LARK—Horovitz 


Educational Department, The Cleveland 
Museum of Art* 


The validity of subjective judgment in art 
has often been questioned. On the other 
hand, attempts to establish a basis for objec- 
tive judgments, by defining specific art values 
which may serve as criteria, have been 
attacked and laughed at. The reasons for 
dismissing objective judgment are generally 
that art is something personal, an emotional 
response; that it cannot be analyzed scien- 
tifically or approached by arguments of 
logic; or that so-called objective judgments 
cannot, in the end, be more valuable than 
subjective judgments. 

In order to proceed objectively we have 
defined certain qualities, through discussion 
and by means of continuous re-terming of 
concepts, in such a way that ambiguity and 


vagueness are eliminated as far as possible.” 


Since aesthetic concepts are generally used 
in a vague manner which permits various in- 
terpretations, their compound meanings were 
broken up and reduced to the several factors 
which they contain. Each factor was again 
defined. Whenever a drawing was analyzed 
for the qualities it contained, the same defini- 
tions were used consistently in order to avoid 
personal-subjective interpretations of con- 
cepts. 

Our analysis of a drawing has not the 
direct aim of evaluating it as a drawing. It is 
rather in the nature of a minute and accurate 
description of all observable qualities of the 
drawing, which is followed by a classification 
of the findings. If desirable, a judgment can 
be arrived at from the material assembled 
without even consulting the drawing. 

Subjective judgment also proceeds by 
description and observation. The difference 


san ale ee aeries out. wader & Guat sf the Gen. 
eral Education Board of New York to the Educational 

ment of The Cleveland Museum of Art. The Author 
wi to thank Dr. Thomas Munro for reading this manu- 
script and for his suggestions. 


between it and objective judgment lies in the 
degree of accuracy reached by means of con- 
sistent use of definitions, enumeration of all 
available facts, and elimination of personal 
responses which are apt to obscure factual 
observations by the judge, and to which even 
persons who work constantly with objective 
methods will yield. 

In our analysis of children’s drawings** for 
the age levels 6-15, in which we checked all 
qualities which were perceptible in any of the 
actual drawings at every given age level, we 
arrived at age-level norms and found that 
certain art qualities appear, grow or decline 
between the ages of 6 and 15. Objective 
methods of checking qualities were used by 
applying consistently definitions which had 
been agreed upon. 

In order to compare the resulting objective 
judgments with subjective judgments made 
by persons who rank high in the artistic pro- 
fessions, we asked nine judges for their opin- 
ions of a group of 57 drawings made by 12- 
year-old boys and another group of 59 
drawings made by 12-year-old girls. All 
drawings had been part Of our objective 
analysis. 

The nine judges*** were: two university 
professors who specialize in aesthetics, two 
professional artists of repute (painters), two 
curators of museum collections, and three 
art educators (a public school art director 
and two supervisors). They all were asked 
the same question: “Kindly write on the 


** This analysis, carried out under a grant of the General 
Education Board of New York to the Educational Depart- 
ment of 7 Museum of Art, was ly used for, 


of, an estimation of children’s drawi: 
Clevelan 


met. Fae > to express the phe of the de- 
the judges who made this experiment 
ue ‘ their helpful assistance and cooperation. 


153 





154 JOURNAL OF EXPERIMENTAL EDUCATION 


blank to the left of each drawing whether 
you would classify the drawing as belonging 
to the ‘best’ or ‘worst’ of each of these groups. 
Explain for either classification the basis of 
your judgment in terms of presence or absence 
of art (drawing) qualities.” 

The results showed, first of all, that the 
consensus of opinion of the nine judges was 
limited to a very few cases. Graphs I and II 
give the choices of “best” and “worst” draw- 
ings of boys and girls by the nine judges. 

The choices of boys’ best drawings have a 
majority of judges’ votes for only five draw- 
ings, though 18 in all were chosen. The worst 
drawings have a majority of votes for only 
three out of 16 drawings.* This shows how 
little agreement exists between the judges 
concerning best and worst; that is, the ex- 
tremes of the group. Moreover, when we 
compare the choices of judges of the same 
profession, we find that the two aestheticians 
agree in only two cases out of the 19 choices 
they both made; the two artists agreed in 
only two cases out of their 20 choices; the 
two curators agree on three choices out of 
21; and of the three art educators, two agree 
on 7 choices out of 17, and all three agree on 
three choices out of 29. Furthermore, there 
are two drawings, numbers 2066 and 3413, 
which were pronounced by one or two judges 
as best and by others as worst. Since 18 
stawings were pronounced best and 16 worst, 
only 23 remained as average. 


The graph of choices among the girls’ 
drawings (Graph II) shows a still greater 
scattering. Of the 59 drawings in the girls’ 
groups, 26 were chosen as best by the judges 
and 15 as worst. Thus the remaining 18 
average drawings appear almost as if they 
were exceptions. 

According to Graph II, no majority of 
opinion could be reached concerning best 
drawings and only three concerning worst 
drawings. Aestheticians, curators, artists, and 
educators differ in their opinions among 
themselves. It seems that more judges would 
have had to be consulted in order to reach a 
majority. 

The more interesting part of this study of 
variability in subjective judgments among 

* Both groups of drawings, those chosen as best and those 
Gees ot, dee St ee ae a ae 


[Vol. ro, No. 3 


judges who are art experts, is the analysis of 
the comments which explain their choices, 
and also the differences in observation of 
apparently similar drawing qualities of vari- 
ous drawings. 


All comments** contain one or more qual- 
ities which belong to either of the three 
following categories: Representative quali- 
ties, that is drawing qualities which mainly 
enhance the actual representation of the sub- 
ject, e.g. showing by placement what the 
situation is supposed to be or what the figures 
are doing, more or less correct drawing of the 
various objects or people; aesthetic qualities, 
i.e. qualities which appear somehow detached 
from representation and which manifest 
themselves in the arrangement and pattern- 
ization of colors, lines or masses; technical 
qualities, ie. drawing qualities which are due 
to the handling of the medium in its relation 
to the representation, as actually shown or 
suggested. 


Doubtless none of these three groups of 
qualities can exist by itself, or can be en- 
tirely separated for the purpose of analysis. 
Yet one of the three may dominate, and thus 
influence the judgment; or the judge may 
find one of the three (for instance aesthetic 
qualities) more of a determinant regarding a 
best or worst drawing, and be guided accord- 
ingly. For example, a correct drawing of a 
landscape, in which, however, color organiza- 
tion or balance of masses is neglected, may 
not be chosen because the judge is led to his 
choice by aesthetic qualities rather than 
representative ones. 

The following tables*** presents the quali- 
ties mentioned most often in the judges’ com- 
ments as bases for their choices. The percent- 
ages in the table are based on the number of 
elements or qualities mentioned by each 
judge. Thus, if judge III made 20 remarks 


concerning representation, 2 concerning tech-- 


nique and 15 concerning aesthetic qualities, 
he made a total of 37 remarks; therefore 
54% of his judgment was based on represen- 
tation, or the way the picture parts are 
drawn, 544% on technique, and 40%4% of 
his judgment rested on aesthetic considera- 
tions. Since the content of the comments for 
best and worst, boys and girls, differs very 


** See sample of comments in Appendix I. 
*** The divisions: tation, Aesthetic Qualities, Tech- 


nique, were made for of an uate comparison 
with our “Ob ”, which is into the 
Secs Asa 


—- ©75f OM ©) On Ww —- 


baa w 





March, 1942) COMPARISON OF CHILDREN’S DRAWINGS 


TABLE OF QUALITIES MENTIONED IN THE COMMENTS OF THE JUDGES 
AS BASES FoR THEIR CHOICES 
(In percentages) 
(The Roman numerals refer to the different judges) 
II Ill IV Vv VI VII VIII IX 
Representation 55 55 53 58 61 62 a 
8 


Aesthetic Quality 36 39 38 34 36 37 
Technique 9 6 9 8 3 1 





little, all explanatory remarks were combined 
and treated together. 


The distribution of remarks concerning 
drawing or representative qualities, aesthetic 
means, and techniques, varies little among 
the nine judges. All judges, perhaps uncon- 
sciously, seem to think of representation as 
an essential in making a picture, of aesthetic 
means of expression as the next important 
factor, but consider technique of slight im- 
portance for children. 


Hence, there is no doubt that the judges 
agree on the importance of certain groups of 
qualities, and that on the presence of these 
qualities they base their judgment of a best 
drawing and on their absence that of a worst 
drawing. 


What is, then, the reason why they differ 
so strongly in their choices; why this differ- 
ence of opinion goes so far as to impel one 
judge to pronounce, for instance, drawing 
3413 best, while two other judges call it 
worst ? 

Judge II, who considered number 3413 
best, says: “Hard to estimate. It is too 
simple, but there is something about the de- 
sign and the clarity of the boat and the boy 
that I like.” In other words, the design—an 
aesthetic quality—, clarity of the boat—rep- 
resentational quality—, were the reasons for 
judge II’s choice. Judge IV, pronouncing the 
same drawing worst, says: “Vague, discon- 

‘nected in movement, interest scattered.” In 
other words, vague—lack of representational 
quality, disconnected in movement—lack of 
aesthetic quality, interest scattered—lack of 
organizational, i.e. of aesthetic qualities, 
underlie his judgment. And judge VII says 
of the same picture: “Drawing lacks syn- 
thesis. Parts unrelated, and shows lack of 
imagination.” In other words: lacks synthe- 
sis, parts umrelated—absence of organiza- 
tional, i.e. aesthetic qualities; lack of imagi- 
nation—lack of representational and aesthetic 
qualities. 


Thus differences in standards are not the 
cause of the scattering of choices, but rather 
differences in their practical application. The 
standards, however, are vague, as can be seen 
from nearly all comments. To say of a draw- 
ing that it is “well composed”, or it has “no 
positive value”, or it is “below age level, yet 
somehow transcendent”, or “it is like a 
Chinese painting” explains little about the 
qualities in the drawing. “Well composed” is 
a general term which, unless broken up into 
more specific components, cannot be applied. 
The expression “no positive value” raises the 
question of what a positive value is, and in 
what way, after being defined, it applies to a 
drawing. Again, the remark “it is like a 
Chinese painting” implies that similarity to 
any Chinese painting, be it good or bad, as 
long as it is Chinese, heightens the value of 
the drawing. 


An objective analysis of a drawing does 
not directly make a choice between good or 
bad, best or worst. It states, however, in 
detailed and fairly definite terminology, the 
presence of any qualities in a drawing, and 
wherever necessary, its degree. From a chart 
thus obtained (Chart I) for age level 12, we 
find a number of quality checks for those of 
the boys’ and girls’ best drawings which were 
chosen by the nine judges. 

Such objective checks can be compared 
with the norms which were established with 
the help of the charted qualities of the chil- 
dren’s drawings of the age levels 6-15 and 


which, for age level 12, can be seen in 
Chart II. 


The best drawings of the boys, such as 
697, 1338, 1343, 1804 and 4034 which were 
chosen by a majority of judges, would have 
been marked superior also as a result of 
objective quality checks,.since they combine 
a number of qualities which, according to 
Chart II are all outstanding, “rare”, for this 
age level; in fact are still rare for age level 
15. Of the drawings which were chosen by 





156 


one, two or three judges, such as 1802, 3464, 
3537 and others, some show no outstanding 
quality whatsoever, some check on a few. 
Objective analysis would rank these drawings 
among average ones. Worst drawings would 
be classified as poor or average according to 
our objective analysis. 

There is one interesting question which is 
being answered unknowingly by the judges’ 
comments. All drawings marked best by a 
majority of judges are representationally 
true-to-appearance; some also show perspec- 
tive. A few of the drawings chosen as best by 
single judges are representationally schematic 
or mixed, that is, average for age-level 12, 
while some of these, according to objective 
analysis, are checked for not a single out- 
standing quality, others are checked for one 
or two of the higher qualities, such as blended 
tints, texture or moderately effective use of 
medium. But, and this is significant, not a 
single drawing was marked worst which was 
representationally advanced. The interpreta- 
tion of this fact, and the fact that all draw- 
ings marked by a majority as best are repre- 
sentationally accelerated, and that the 


amount of comments on representation alone 
is about 60%, can only point to the follow- 


ing: representation is an important and de- 
ciding factor in judging children’s drawings, 
and especially so when the drawing is not 
sufficiently supported by aesthetic qualities; 
on the other hand, a representationally aver- 
age drawing qualifies as superior only if it is 
enriched by a number of technical and 
aesthetic qualities. The conclusion is that, in 
an objective analysis, checks on representa- 
tional qualities must be weighted heavily 
because of their importance. 


Of the two drawings, 2066 and 3413, which 
were ranked both best and worst, No. 2066 
is checked on a single outstanding quality: 
intentionally indefinite form, which is a 
quality of sophistication, the value of which 
in the drawing depends on other qualities 
with which it may be linked. Number 3413, 
while representationally average, is checked 
for “subtle or delicate” line, which quality 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 0, No. 3 


miay have been a dominant determinant in 
the particular judge’s mind who marked it 
best, while the opinions of the two judges 
who classified it as worst were weighted by 
the absence of other qualities which they 
deemed more important. 


Thus, single judges will frequently consider 
as outstanding a drawing which, because one 
or two qualities appeal to them strongly 
enough to obliterate the lack of all other 
qualities, while objective analysis would 
mark it as average. If, however, all nine 
judges were of the opinion that the strength 
of the one outstanding quality carries the 
drawing, such judgment would be decisive. 
In fact, although norms can be established 
with the help of objective analysis, the im- 
portance of each rare or superior quality 
remains undetermined. In other words, one 
cannot add the checks on superior qualities 
and say: Johnny has five superior qualities, 
Bobby has seven, therefore Bobby has more 
ability than Johnny. Exactly the opposite 
may be true. Yet the importance of the in- 
tensity of a single quality, or that of the 
accumulation of several qualities, cannot be 
determined from the responses of one or two 
judges, since their responses differ. They do 
not differ, however, when a greater number 
of outstanding qualities show in one single 
drawing. 


Summarizing, we may say: unless a suffi- 
cient number of expert judges are consulted, 
subjective judgments, though based on sim- 
ilar aesthetic principles, differ to such an 
extent that they seem unreliable. Compared 
with them, judgments based on objective 
analysis appear more reliable. Since it is 
difficult and inconvenient to assemble nine 
or more art experts for purposes of deciding 
children’s art abilities, and since objective 
analysis could either be handled by a trained 
specialist for groups of schools or by a central 
office for a whole state, the establishment and 
standardization of norms for all branches of 
art activity and all age levels up to college 
age is recommended. 





March, 1942] COMPARISON OF CHILDREN’S DRAWINGS 


GRAPH I 


Identifi- Identifi- 

cation Boys’ best drawings cation Boys’ worst drawings 

no. of Roman numerals identify judges no. of Roman numerals identify judges 
drawing drawing 
II III IV VI VIIVIII Ix III VII VIII IX 
x x 696 
x 
x 
x 


766 


1 PAPdPd Pd: bd 


a 


i. fr a 


7 


ee i oe 2 


vat alal 


5048 =X ae x 
Total number of worst drawings—16 


pe 2 a 


a 
4; 


Total number of best drawings—=18 * Indicates same drawing in selections of best 
and worst. 


GraPH II 
Identifi- Identifi- 


cation Girls’ best drawings cation Girls’ worst drawings 
no. of Roman numerals identify judges no. of Roman numerals identify judges 
drawing drawing 
i: 2 Vv VII VIII Ix II Ill IV VI VIIVIII Ix 
620 site re «k 626 | Sina 
mA a ne ms ee ea 
a ae 


oe 


vata 


cet et eee tt Oh OO te Pe 


x 


x 


1 Pre < 
rath 


ws x x 
a ac x 
aM aigd ae He 

_ x 


va 


Pad Pd 


ee ee He 


a 


BEY ike; ve 


Pad dd 
a 


‘ 


en) ae ee ae. a x 
ae ° x 
a > 2 a 


x nn 


a i aba! 


tat at ate alata! 


ala 
vet atalat 
va 


ad Total number of worst drawings=15 
= * Indicates same drawing in selections of best 
se and worst. 


rar 


pe, x 
Total number of best drawings—=26 


Padded 





aurT - a wh 
NOLLV.LNASdadaa 


Ss wnipsy jo 3s 4IO]O) Poly 
= ANOINHOAL SNVAW DLLAH.LSAV 





«4S'TUI) 
I LuvHO 


"II x1pueddy oes sZurpeay Jo SUOT}TUYSP JOT os 
"780M PUB 489q JO UOT}Oa]os Ul SULMBIP OUlYS SOzBIIPUT , 


a2urIsSIG 

yensnun 

22u8\sIP 
Oys soutig 


:8uidnoid ‘urd19 
aanredsiog 


Bura=pow 
Tevonuazuy 
auryd 2u0D 


2aQeJ029q 


FPaH9 

Ayaqesrapow 
yU3}sIsUO7>) 
Surpuris3no 
[e20] UF “IBA 
30 pepue|d 
asN 3A1}8I0I9q 
sourjd Aur 


qawesuriIe 
48pnus ‘possey 
SAO@ I LUVHD 


Sura 


iar | 
a 
° 
g 
— 
iS] 
; 
5 
a 


*vORUa}UI swI0.7 
[hyssa2ons voHOWw 


ve 


wpa jO asf) JOJO vary 


ANOINHOAL SNVAW DLLAH.LSAV 





NOLLV.LNaSaadaa 





‘II xIpueddy oes Zurpeey jo Uor}UYEp JOT 5. *7810M PUB 4S80q JO UOTWe[e8 Ul BuLMBIP OUTES SeZBOIpUT , 


” 
S 
= 
~ 
SS 
x 
ge 
Q 
” 
S 
5 
: 
iS) 
ky, 
o) 
S 
~ 
Re 
= 
a 
= 
=) 
1S) 


qUajsIsuOdUT 
4y 
aye 
AyazeJapow 
quajsIsu07) 
4neawuhsy 
“UOIjUazUI SWIIOJ 
9AIQeIOIIG 
Zuipurys}jnoO 
a2ursIG 
Te20] ur “eA 
21njxo 
JeuonUsjzU] 
UR pepus|_ 
yensnun 
IZ *UtZIO 
TNyssa2ons vOROW; 
a2UtISIP 
Zurmoys souryd 
sourid Auryy 
aurjd 2u9 
aanredsiog 


‘ Zurdno 
aouvivadde-0}-anzy 


wnipay jo asp Bay 


aANOINHOAL OLLAH.LSAV 


March, 1942] 





NOLLV.LNASdadda 





UMNIpeul JO VSN yUeySIsUODUT x 
umypeur JO esn eatqoye Aljensnuy x 


[Vol. ro, No. 3 


uInIpeut JO asn sATJeye Ajeye1epow~] x 
%GL X WNIpeu Jo sn 4Us}sISUOD ArjyouruAse pouusl[g x 
opyluyepur Al[euoUezUI sULIO x 
qUoUTeZUBIIV <ATVBIOIIG xX 
Sulpepowm x 
Surpueys}zno 10joD x 
Q0UBISIP IO[OD x 
UOIJVLIVA IOJOD xX 
@1N}X9} [BUOTJUS}UT xX 
svole plog x 
83UI} pepuslg x 


Beale peopel x 
Bole pessey x 


CuHart II 
NORMS OF ART QUALITIES FOR AGE LEVEL 12 
(For definitions of headings see Appendix II) 


OUI] 9AI}B.1090(T 
x 


eulT eqns x 
oul, Plog x 
[ensnun ‘ZurdnoizZ poziuesig x 


[nysseoons uoTj0,_ 
¥ 


= 
So 
— 
& 
Pe 
C 
5 
iS) 
| 
SS 
= 
<) 
= 
— 
Re 
a 
be 
Ry 
Ry 
o 
| 
ma 
es 
=) 
=) 
SS 


eoUBSIP SULMOYS souB[g x 
souvld Aueyy x 
euvld sug x 
eATqVedsieg x 
euleysg x 0384S paexify xX aouvivedde-o}-oniy, x 


sO oO oO 
RS . . . . . . . . . . . N . . . . . . . . . —— 





~w 
S 
= 
~ 
= 
= 
Re 
Q 
<<) 
Ss 
5 
| 
= 
6) 
Ry 
i) 
< 
A 
~ 
~ 
Qa 
~ 
jo) 
1S) 


March, 1942] 


‘SBopl MAJ “1O[OO 1OJ Buyooy 913417 
‘plo “aA ZI AOy oinSy uvumy jo Zutmeip 100g ‘a0eds 
yO Ul Suljy eyenbepe jyo—uorjisoduiod jo ssues ON 


*O}TUYoOpUl— plully—[BIIZ.19A 
04} jO osues ON ‘“Suljeey 10 UOoIyeAXesqo JO yoe'T 
*‘plrya aesunoA yonur & Jo YOM 9} JO 


@AT}SeS3Ns VOU BINSY JO ssouyyg “1e}eM puB oinsy 
useMjeq UOI}B[al ON “JUSUIOAOUL 9ZITENsSIA 0} AZITIQuUy 


E807 ‘ou 
*XI 


S807 “ou 
‘THA 


SSrg “ou 
‘TIA 


*SuoLvonady LUy 


*qno peli1Bd you 4nq 

Sulpuesjsiepun suios Moys sAryedsied pue uors10do1g 
*10[09 ‘suoABi9 

jo osn ‘Aqyenb sovjins Joy Suljeey 9331] ‘“Surmerp 
UI Wey} ASAUOD 0} MOY JO osueS OU 4Nq SBapt Jo AjUETg 


"uor}.10d 
-oid jo osues ON ‘SUIMBIPD peg ‘UOT}ISOdWI0D ON 


667E “ou 
‘TA 


8STLE “0 
‘A 


?SYOLVUND NOLLOATIOD 


“Bul 
-MABIP Peg ‘UOIZeZIUBZIO 10 USIsep Jo uolydeou0d ON 


‘OBBUII [BIA ‘[NJ1O[OO B@ Suronpoidarz 
ut Aof ou ‘yueureAou ou ‘uOIsseidxe dIVBIUUIBISBICZ 


‘sso[Suluveul puv opnip 


°0} Palvos pey oY Ff 10}}0q oUOp 
aaBy plnoo ye ey smoys (7) peq ul sarmoedsied 
JO UOI}BIIPUL BYUSIIG “esues aAIZVIOVep ou ‘esed [TY 0} 
4du19}}8 ON “UOT}BNZIS 10 9UGDS JO UOIZBOIPUI OU 10 97331] 
yytM ‘A[UO OM} IO Y9}B1INS SSe[oIvD Y ‘[[NnP pue yea 


S807 “ou 
‘AI 


L6PE “Ou 
‘III 

:SLISLLUy 
870g ‘ou 
‘II 


€80F7 “ou 
I 


*SNVIOLLGHLSAY 


LSYOM 


‘uol}IsOduI0d poor ‘pesn A[aArjIeye JUSUTEAOUI 
aul, JO uolyyedey ‘“10j09 Jo uolzyjedel Butjse10juy 


‘yoddnd 

@ jO d4SLIoj08I1eYyo AzeNprArIpur sey uo yeddnd 
ey, ‘s}deou0d pesserdxe [jem pue peAlsoU0d [[9M 
‘Burmeip jo Azryenb 


UI SseUssCOT ‘s[eIjUesse-uOU asyeUTUTTT® 03 AITIGY 
‘uortpisoduiod ut Ajlun pue yuUSEeUIeAOUL JOZ Burjeay 


EOP “ou 
*xI 


€PsT ‘ou 
‘TITA 


9209 ‘ou 
‘TIA 


*SuOLVONGY Lay 


‘OAIySeSSns A[JOUTISIP SI ZurMeip [enjzow oyy, ‘AoA 
~UOd 0} PapUszUI P[IYO 94} YOY Bop!l oy} JO SUOTBOIpUT 
[INF e1e 9104} 3Nq ‘s[I¥jep OU ov eTeyy, “pesodul0s 
[Joa pues ‘vale [BIZUESSe 94} 0} Poe}IUNI, Sf UOIZISOdUI0D 
ey, “19}8M saoge oin3y ZulAIp e—puvy ur 4ydefqns 
ey} jo uojdeou0d & sey UBUISs}yZNVIpP 9Y} esSNBIEq 
pood st einjord ayy, *ystuoIsserdul Ui0g & SI p[Iyo sIyy, 


‘ainqoid ,,aye[dui0d,, WY “s10jod poor) ‘AreUIpi0 
-B1}X9 918 SIOAIP 94} 3Nq ‘SIOUIUIIMS 94} YNOGe onZBA 
e141 WV ‘perpuey Alqe ‘uorisoduiod snoiquie AeA VW 


990 ‘ou 
‘IA 


EOP “oO 
‘A 


:SYOLVAND NOLLOATIOD 


“punos oy} 
pue iojye1edo yeddnd ey} JO uoTjONpoOIjUI IeAR{D “Bur 
-MBIP BUIDUIAUOD AIBA “UOIZVZIUBZIO USISep yUET[I0xq 


“‘q1un dtureUuAp 
@uo 03UI a UesAOM SsjUeUTaTe AuBpy ‘seloureUr 
pozijensta AjZuo1js jo uoljejyidiezur <sAIssordxq 


‘uorzezijoquids poos & a1n4o1d 
ajoyM oy} pus ‘poos aie seinsy ey} ynq ‘ejdultg 


‘sS9[0189 BINSYy dIZwUIEYIS 
“Pert? ay Ff 10930q OP P[NOD “pezeoipur sinjont}s 
8}I puwe zeoqiojout Jo Burz[13 ‘eurjAys Jo oj3ono0YWIIS 
PIOM “SUOISIAIp UTeUI Ul ZuoI}s yng [IejJep Ul epnzH 


eret “ou 
‘AI 


6PE “Ou 
‘Ill 


: SLSLLUY 


LEcEg “ou 
‘II 


Z08T ‘ou 
‘I 


: SNVIDILGHLSAY 


LST 


(ZT e8y ‘sXhog) 
SDONIMVUG LSYOM AGNV LSA AO SADIOHD NO SLNAWWOO ATIdNVS 
I XIGN@ddV 





JOURNAL OF EXPERIMENTAL EDUCATION 


APPENDIX II 


DEFINITION OF HEADINGS OF 
CHARTS I AND II 


Representation 
Schema: 
Schematic drawings represent objects 
as they are believed to be, regardless 
of their actual, visual appearance. 
Size relationships are incorrect; for 
instance, the head of the figure may 
be three times as large as the body, 
or, the body and legs may be elon- 
gated many times their length while 
the head and arms remain small. All 
objects are drawn part by part, out- 
lining each, joining them in an addi- 
tive process, during which the parts 
are often connected at the wrong 
places. Top, side and front views are 
combined, objects and figures are 
often made transparent. These char- 
acteristics apply not only to single 
objects but also to compositions. 
Mixed Stage: 

It bears some of the characteristics 
of both the true-to-appearance and 
schematic stages combined. Interme- 
diate between the two. 


True-to-appearance: 


These drawings represent objects and 
human figures approximately as they 
appear when seen from one single 
viewpoint. Parts which are hidden in 
a particular view, though existent, 
are not represented. Size relation- 
ships are fairly correct. Parts are 
joined at the correct places. Though 
outlines overlap and thus indicate 
depth, modeling or representation of 
space through perspective may, or 
may not be represented. 


Perspective : 


Linear perspective is a representation 
of objects as if in three dimensional 
space by means of diminishing sizes, 
converging lines and foreshortening. 


One Plane:* 


The objects are all placed in one pic- 
ture plane, not overlapping, and as 
though they all were at equal dis- 
tances from the observer. 
There is no depth suggested, no per- 
8 ive, ie. convergence and no 
diminution of sizes. 

Many Planes:* 
The use of many planes is a more 
advanced development. It necessarily 
implies the suggestion of distance 
through overlapping of objects, in- 
dicating that some objects are far- 
ther from the observer than others. 
Many planes may be indicated with 
or without linear perspective, i.e. 
convergence. 


[Vol. 10, No. 3 


Planes showing distance :* 
The use of planes showing distance is 
an advanced development of repre- 
sentation. They usually only occur in 
landscapes, and the distance may be 
shown either through perspective, 
through multiplicity of planes with 
diminishing sizes, through variation 
of light and dark, i.e. paling of colors 
at the horizon or variation of hues. 

Motion successful: 
It is represented in the human figure 
or animals by flexibility of the body, 
bending of limbs, coordination of 
parts; in inanimate objects by lines 
symbolizing movement, such as cir- 
cular whirls around wheels, parallel 
lines to indicate air rushing past, or 
by blurred parts. 


Aesthetic Means 
Organized Grouping; Unusual: 
It is checked when three or more 
objects or figures are definitely organ- 
ized into a group, balanced and inter- 
related in regard to aesthetic effect 
by means of line-, color- or area- 
arrangement. 
Line: 

Bold:* 
Bold lines are those drawn with 
sufficient vigor to give a contrast 
with the surrounding areas and must 
show control of the medium as well 
as use for an aesthetic effect. 

Subtle-delicate :* 
Subtle lines show a controlled and 
varying gradation of width and thick- 
ness in order to emphasize parts of 
the drawing. Delicate lines show pre- 
cision and decisiveness je ap- 
plied delicately and faintly for the 
sake of an aesthetic effect. 

Decorative use: 
Decorative line is used either rhyth- 
mically or in repeats to achieve 
eee this technique an aesthetic 
effect. 


Area: 
Ragged-smudgy :* 
hese areas are filled in erratically 

and negligently producing an uneven 
effect, either because of technical in- 
ability or lack of interest or uncer- 
tainty about the desired effect. 

Graded :* 
In these areas, a hue or several hues 
are shaded from light to dark. 

Blended tints:* 
a Loe such that various hues are 
mingled in one area. 

Bold :* 
Bold areas are not merely thick in 
layer but suggest, through color and 
execution, daring contrasts or com- 
binations. 

* See footnote next page. 





March, 1942] 


Intentional Texture :* 
These areas show intentional varia- 
tions of B surface for the sake of 
representing some texture, e.g. grass, 
or for the sake of an Sestibleatay 
important effect in contrast to other 
areas. 


Color: 


Variation in local colors: 
Local colors are such colors as are 
characteristic or “true” for known 
objects, people or animals, e.g. grass 
is green, forests—even in the distance 
—are green, sky and water are blue, 
etc. Variations introduce shadings of 
the true colors, e.g. several greens for 
trees or grass, several blues for sky 
or water. 

Distance :* 
Distance is indicated by light or 
greyed hues, e.g. purplish mountains. 

Outstanding :* 
Outstanding use of color is an unusu- 
ally effective use obtained either by 
contrasts, subtlety, or organization 
(symmetry, isolated color spots). 

Modeling :* 
This is represented by sufficient grad- 
ing with the help of the medium 
(pencil or crayon) to suggest solidity 
and shape. It may include shading 
from dark to light, whether or not 
related to a definite source of light. 

Decorative Arrangement: 
Such an arrangement shows a con- 
scious distribution and emphasis of 
lines, colors or shapes either by using 
some modes of balance or by repeats 
or by subdivision into units which 
elaborate one or more themes for the 
sake of an aesthetic effect, e.g. 
symmetry. 

Forms intentionally indefinite :* 


COMPARISON OF CHILDREN’S DRAWINGS 163 


Planned Asymmetry :* 
Intentional asymmetry is a purpose- 
ful grouping of objects in an aed oma 
manner so as to attain a satisfactory 
feeling of stability without the use 
of a central axis. 


Technique 
Use of Medium: 


Consistent: 


A medium is used consistently when 
the tools and colors or ink are ap- 
plied in such a way that they follow 
the natural functions of the tools. 
For ——_ a fine-point tool will 
not make broad strokes, or a coarse 
tool, fine lines. Thus, to be consistent, 
the number of details, when rendered 
with a blunt crayon point, must be 
reduced or the details rendered in 
large sizes. 


Moderately effective :* 


It is achieved when the effect is be- 
yond plain consistency. 


Unusually effective :* 


Such use is observable when the 
medium is handled with such skill 
that the effect obtained surpasses by 
far the usual expectations for this 
medium. 


Inconsistent :* 


A medium is used inconsistently 
when effects are attempted which are 
incompatible with it and which point 
to the use of a different medium, e.g. 
when fine lines, closely drawn, are 
attempted with a blunt crayon though 
they would best be drawn by pencil- 
point. 


*The content of this definition, though re-worded, is 


They are achieved by outlines pur- __ identical with the one used in Lark—Horovitz, Barnhart and 
bine blurred or dmend in Be ol Sills, “Graphic Work-Sample Diagnosis, An Analytic Method 


> . of Estimating Children’s Drawing Ability’, The Cleveland 
to interrelate with the background. Museum of Art, Cleveland, Ohio, 1939. 





STEPS FOR THE APPLICATION OF THE JOHNSON-NEYMAN 
TECHNIQUE—A SAMPLE ANALYSIS 
ROBERT H. KoENKER 
University of Minnesota 


Cart W. HANSEN 
Quincy, Ill., Public Schools 


In a recent investigation by one of the 
above authors, an attempt was made to dis- 
cover if certain underlying characteristics 
distinguish excellent achievers from poor 
achievers in two-figure division. A screening 
test in two-figure division was administered 
to 283 pupils in 6B classes in 9 elementary 
schools. After the 283 pupils were found to 
be homogeneous in division with respect to 
both variability and mean achievement, they 
were grouped as one distribution. The upper 
one-third, the 90 best achievers, were selected 
as excellent achievers, and the lower one- 
third, the 90 poorest achievers, were selected 
as poor achievers. The two groups were then 
measured on a number of factors thought to 
be associated with ability in two-figure divi- 
sion. The two groups were first compared on 
these factors by means of the “t” test for the 
significance of the difference between two 
means. This comparison showed that excel- 
lent achievers were significantly superior to 
poor achievers on all factors. 


Since excellent achievers might be superior 
to poor achievers on these factors because of 
such variables as mental age and chronolog- 
ical age, it seemed necessary to compare the 
two groups with the effects of these two vari- 
ables ruled out. This comparison could be 
made by means of a matching technique, but 
this approach is of limited value, because so 
many cases are often lost in matching that 
final groups are not representative of initial 
groups. To overcome this difficulty the 
Johnson—Neyman technique was used.* This 
technique consists essentially in testing the 
statistical significance of the best estimate 
of the difference in achievement between two 
groups, when the two groups are matched 
statistically on two basic characters. This 


* Palmer O. Johnson and J. Neyman, “Tests of Certain 
Linear Hy 


J 

theses and Their Application to Some Educa- 
tional lems,”’ Statistical Research Memoirs, I (June, 
1936), 72-93. Those interested in a discussion of the mathe- 
matical basis of the 


technique may refer to this original 


164 


technique eliminates the need of an 
individual-for-individual matching technique, 
and also makes it possible to determine at 
what level of significance the two groups 
differ, as well as to determine the limits of 
the regions wherein the differences between 
the two groups are significant. This technique 
developed by Johnson and Neyman is rapidly 
becoming recognized as one of great value to 
experimenters in education; it has been used 
to great advantage in many recent doctorate 
theses at the University of Minnesota and 
elsewhere. A sample analysis of the steps in- 
volved in this technique is given in the fol- 
lowing pages, for the benefit of those who 
may wish to apply it to the study of their 
problems. The basis of the analysis is an ex- 
planation of the technique as used in compar- 
ing 90 excellent achievers in division and 90 
poor achievers in division on ability in sub- 
traction,** when the effects of mental age 
and chronological age have been statistically 
controlled. 


Step ‘One. Compute the basic statistics for 
the dependent variable, subtraction, and the 
independent variables, mental age and chron- 
ological age. Basic computations involve 
means, standard deviations, and _ inter- 
correlations of the variables. Pertinent data 
are shown in Table I. The statistics for ex- 
cellent achievers are designated by the sub- 
script 1, the prime, and the letter z. The 
corresponding statistics for poor achievers 
are designated by the subscript 2, the double 
prime, and the letter #. 


Step Two. Translate the problem into the 
form of a linear hypothesis. The null hypoth- 
esis which is set up for this study is that in 
the population from which the sample is 
drawn, the difference between excellent and 
poor achievers with respect to achievement in 
subtraction is zero, when the effects of chrono- 


**In the original study excellent and achievers were 
compared om twelve factors by mucane of the Johasen-Seyeuan 





March, 1942] 


TABLE I 


STATISTICS COMPUTED FOR APPLICATION OF 
JOHNSON—NEYMAN ANALYSIS FOR 
SUBTRACTION TEST* 


Excellent Achievers 


#, = 15,122222 
ys = 10.883333 
2 = 25.277778 
o'p= 4.484115 
= 4.674398 
954845 
—.089145 
233412 


Poor Achievers 


x2 = 10.066667 
Y2 = 14.022222 
u = 20.733333 
o".= 4.642317 
o”, 5.975836 
Fy 5.836285 
r” xy = —.199112 
Txu== .103590 
Tyu = —.283369 
Ns= 90 


*To simplify computations all the values 
used in this analysis are in terms of code 
scores. To obtain the code scores the midpoint 
of the lowest raw score interval was sub- 
tracted from each raw score, and the re- 
mainder was divided by the height of the 
interval. The raw scores, raw means, and raw 
standard deviations may be obtained from 





, 
—PV xy%yz Ox 


, ee 9 "x2 
Pw, y) =| 3+ I — Py" 
eg Py — 7" xy Tyu 
| + I — xy” 


their respective code values by use of the fol- 
lowing formulas: 


o's 
o on 
7 (x” ae X2) 


o x 





mental age 
chronological age 
subtraction score 


X= (x) 3+ 109, 
Y=y+ 123, 
Z=2+4. 





2 12) — (—.o891 
(x, °) = 25.277778 ++ 33412) — ( 2 


APPLICATION OF JOHNSON-NEYMAN TECHNIQUE 


(2 —x,) += 


165 


logical age and mental age have been 
removed. This hypothesis has been desig- 
nated as H (x’, y’).* The problem then 
reduces to the test of the hypothesis H (x’, 
y’) that no significant difference exists be- 
tween excellent and poor achievers in ability 
in subtraction, and the determination of a 
region of significance if the hypothesis H (x’, 
y’) is rejected at a specified level. 

Step Three. Obtain an estimate of the 
difference in subtraction between excellent 
and poor achievers for each set of values of 
the basic matching characters. To obtain 
this estimate, the linear regression of sub- 
traction upon the basic matching characters, 
mental age and chronological age, is found 
separately for excellent and poor achievers, 
and then the difference between these two 
regressions is found. This difference is 
labelled F(x’,y’) and the equation is as 
follows: 


~_ Y "x2 Ox 


/ 2 
— PW’ xy 


(y’ — 9) + 


ow — 3 | 


Substituting values from Table I in the first 
half of the foregoing equation, the regression 
equation for the excellent achievers in sub- 
traction is obtained. Substituting the values, 
the equation is 


, 
oy 


4 Pyy — Oxy Tu Cy 
a as 


45) (.050203) .954845 





1 — (—.089145)?* 
050203) — (—.089145) (.233412) 


4.484115 
954845 





(x’ — 15.122222) + pe 


(9 — 10.833333). 





A code score of 15 in mental age can be 
transformed to a raw score by multiplyin 
15 times 3 and adding 109. A code score of 1 
in chronological age can be transformed to a 
raw score by adding 10 and 128. A code score 





2(x’,y’) —= 24.347216 + . 


of 25 in subtraction can be transformed to a 
raw score by adding 25 and 4. The coded 
standard deviations for chronological age and 
subtraction are the same as the raw standard 
deviations since the height of the interval is 
only one. However, in case of oe 
the coded standard deviation value of 4.484115 
must be multiplied by 3, the height of the in- 
terval, to obtain the raw standard deviation. 





—.089145)° 4.674398 


Carrying through the computations, an esti- 
mate of the adjusted mean in subtraction for 
the excellent achievers is obtained. This 
value is 


o51061x’ + .014622y’. 


Substituting values from Table I in the 
second half of the equation for the value 
F(x’, y’), the regression equation for the poor 
achievers in subtraction is obtained. Substi- 





JOURNAL OF EXPERIMENTAL EDUCATION 


~e 


[Vol. ro, No. 3 





X Excellent 
e Poor 
C A Center of Accuracy 


Chronological Age (Y¥) 
@ ol 
i 


— 

iw 

wo 

~ 
ae) ee ee ee es | 





re) 123 T T wena 

o_Line of ponasigni ficance—\— 

—eEeEV7"_— fs a ba) 2 
Code 








Mental Age (x) 
ry 10 ' 


“S 

Figure I.—Comparison of excellent achievers (x) and poor achievers (.) in subtraction when 
matching is effected on chronological age, and on mental age as measured by The 
California Test of Mental Maturity, Elementary Series, Grades 4-8. 

i (.103590) — (—.199112) (—.283369) 5.836285 

u(x’,y’) = 20.733333 + 7— (—.199112)* 4.642317 
(—.283369) — (—.199112) (.103590) 5.836285 

x’ — 10.06666 
( 7) + 1 — (—.199112)? 5-975836 
(y’ — 14.022222). 











March, 1942] 


Carrying through the computations, an esti- 
mate of the adjusted mean in subtraction for 
the poor achievers is obtained. This value is 


APPLICATION OF JOHNSON-NEYMAN TECHNIQUE 


167 


This line is represented by on on Figure I. 
The procedure in plotting this line will be 
explained later. Excellent achievers and poor 


u(x’ y’) = 23.858492 + .061747x’ — .2672019’. 


The best estimate of the difference between 
adjusted means F(x’,y’) is obtained by sub- 
tracting u(x’,y’) from 2(x’,y’): 


achievers, whose coordinates in mental age 
and chronological age locate them upon this 
line of non-significance, may be expected to 


F (x’,y’) = (24.347216 + .0510612x’ + .014622y’) — (23.858492 + .0617472’ 


— .267201Yy’); 


F(x’,y’) = .488724 — .010686x’ +- .281822y’. 


Since the effect of mental age and chrono- 
logical age on ability in subtraction is pre- 
dicted from the regression equation, the need 
for individual matching on the two characters 
is not necessary. The value of F(x’,y’) 
changes as the values of the basic characters 
change, and may be positive, negative, or 
zero. When all known values from a given 
set of data except those of x and y, which are 
variable, are substituted in the equation for 
F(x’,y’), an equation of the form F(x’,y’) 
=a-+ bx + cy is obtained. When F(x’,y’) 
=o, no difference between excellent and 
poor achievers exists with respect to ability 
in subtraction. The line represented by the 
equation a + bx + cy = 0 is thus called the 





show the same achievement in subtraction. 
On one side of the line of non-significance, 
the value of F is favorable to excellent 
achievers; on the other side, it is favorable 
to the poor achievers, but not significantly 
so unless the cases fall within a region of 
significance. 

Step Four. Obtain the absolute minimum 
of the sum of the squares. Compute the total 
variance multiplied by the degrees of free- 
dom, V, + N, — 6, where N, is the number 
of excellent achievers, NV, is the number of 
poor achievers, and 6 is the number of sta- 
tistical constants employed in testing the 
hypothesis H(x’,y’). This estimate is desig- 
nated as S*, where 





S.—N.¢22 gy? — Hy? — Hye? 20 xy ar Tyr 
ed 1V%z 


I —?P’ xy’ 


+N, o,? LP 59? — Pg? — Py? + 29 xy Feu Tu 
2 u 








line of non-significance. In this example the 
equation for the line of non-significance is 


1— "x5" 


F(x’ ,y’) = .488724 — .010686x’ + .281822y’ =o. 





Substituting the values from Table I, the 
equation becomes 


S*, = 90(.954845)? — (—.089145)* — (.233412)* — (.050203)? 





1 — (—.089145)? 
+ 2(—.089145) (.233412) (.050203) 





+ 90( 5.836285)? 2—(—-t99712)* — (-103590)" — (—.283369)" 





1 — (—.199112)? 


+ 2(—.199112) (.103590) (—.283369). 





S?, = 2889.504635. 








168 


Step Five. Compute the weighting factor 
(P+ Q). 


P+Q=— [4s 


I I 
rat + er 4 o’,? 
es I T(x’ — %,)? Bi 
+yAit 


. oad 297] oa”? 

The value of (P+ @Q) depends upon the 
values of the basic characters of matching the 
x’s and y’s. Where x’ and y’ lie near the pop- 
ulation means of x and y, the value of (P + 
Q) becomes small, since (P + @Q) becomes 


I I - - - 

Wty = x’ = x, and y, = 9’ 
i 2 

== y,. Substituting the values from Table I 


in the equation for (P + Q), the equation 
becomes 








P+Q—— 


JOURNAL OF EXPERIMENTAL EDUCATION 


4 I f(x’ — 15.122222)? 


[Vol. 10, No. 3 


and the standard deviations. When S*, (P + 
Q) is equated to any positive constant value, 


" (2 —%) (9 =) 


0’. 0'y 





(y’ Sh | 


ms = (=) ‘ (y’ =i}. 





the equation of the ellipse Z is obtained; this . 
ellipse is variable in size, depending upon the 
value of the positive constant to which S’, 
(P + Q) is equated. The center of the ellipse 
E is the point where (P + Q) is at a mini- 
mum. This point is called the center of accu- 
racy and is denoted by CA. This is the point 
at which the variance of the best linear esti- 
mate of the difference between the means of 
the two groups attains its minimum value 
with respect to subtraction. 





go " 


1 — (—.089145)?]_ 


(x’ — 15.122222) (y’ — 10.833333) 


(4.484115)? 


2(—.089145) 





(4.484115) (4.674398) 


I 


(4.674398)? 


(y’ — 10.833333)? } 


[ (x’ — 10.066667)? 





1 f 
+o55'+ 


(x’ — 10.066667) (y’ — 14.022222) 


1 — (—.199112)?|_ 


— 2(—.199112) 


(4.642317)? 





(4.642317) (5.975836) 


(y’ — 14.022222)? 
ba (5.975836)? } 


P+ Q = .001094x’? + 2(.000131) x’y’ + .000837 y’? + 2(—.105508)x’ + 


2(—.100652)y’ + .366918. 


When all values except those of the vari- 
ables x’ and y’ are substituted from the data, 
a quadratic equation of the form 


S’a (P + Q) = A’x”? + 2B’x’y’ + C’y” + 2D'x 


Before the location of the center of accu- 
racy can be determined the value S?, (P + Q) 
for this particular example must be computed. 


‘4. 2E’y’ + H’ 


is obtained in which A’, B’, C’, D’, E’, and Substituting values from previous computa- 


H’ are functions of the correlation coefficients 





tions. 


S*, (P + Q) = 2889.504635 [.co10g4x’* + 2(.000131)x’y’ + .000837y’? + 2(—.015508)x’ 


+ 2(—.011652)y’ + .366918]. 


S, (P + Q) = 3.160748x"? + 2(.377586)x’y’ + 2.417299y"* + 2(—44.810802 ) x’ 
+ 2(—33.669580) y’ + 1060.210256. 








March, 1942] 


Therefore, 


A’= 3.160748, 
B’ = 377586, 
C’== 2.417299, 
D’ =—44.810802, 
E’ = —33.669580, 
H’ = 1060.210256. 


The coordinates of the center of accuracy, 
%, and y,, may now be determined by use of 
the following equations: 

B’E’ —C’D’ 
Xo => Yo 


M 5] 
where M = A’C’ — B”” 


B’ D’ — A’E’ 


M , 


Hence, 


APPLICATION OF JOHNSON-NEYMAN TECHNIQUE 


W.o5 = 


re” 
where 
n,=—= I, 
%_ == (n—s) = N, + N,—6—174. 
Referring to Snedecor’s F Tables* 
F ., = 6.80, 
F 45 = 3-90. 
Therefore, 


W.o1 - t= 25- 588235, 


| 
Ws =~ 00 44.615385. 


M = (3.160748) (2.417299) — (.377586)?, 


M = 7.497902; 
Xo = 


(.377586) (—33.669580) — (2.417299) (4+ pests) 





(7.497902) 
%o = 12.751293; 


__ (377586) (—44.810802) — (3.160748) (—33. 9 





(7.497902) 
Vo = 11.936823. 


These coordinates, x, and y,, are plotted in 


Figure 1 and their point of intersection is 


denoted by CA, the center of accuracy. 





where 

(A’C’ — B’*) (b,c — bc,) 
67+ ¢,? y 

(b,c — bc,)? 


A= 





= and &. = 


The observed w is obtained from the formula 
AH 
k?(H + &7A)’ 





Wors. = 


H = D’x,+ E’y.+ H’, 


a + bx, + CY. 





+c?’ k 





Step Six. Determine whether a region ex- 
ists wherein the value of F(x’,y’) is signifi- 
cantly different from zero. To do this, first, 
establish the levels of significance at which 
the hypothesis H(x’,y’) will be accepted or 
rejected. These levels of significance will be 
denoted as w.., and w,,,. Second, compute 
the observed w; this is compared with the 
established levels of significance, w., and 
W.o,, to determine if a region of significance 
exists. 

The values of w., and w.,, are determined, 
respectively, from the following formulae: 


The values a, 6, and c are obtained from 
the equation of the line of non-significance: 


a@= .488724, 
b = —.010686, 
.281822. 


The values a,, 6,, and c, are obtained from 
the following formulae 
a, = D’c— E’b, 
b, = A’c — B’b, 
BT . = B’c —C’b. 


one T- Statistical Methods. Ames, Iowa: 
Collegiate P Pree 188 1938. P. 187. 


C= 





170 


Hence, 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. ro, No. 3 


a, = (—44.810802) (.281822) — (—33.669580) (—.010686), 


‘3 = —12.988463; 


b, = (3.160748) (.281822) — (.377586) (—.010686), 


b, = .894803 ; 


C, = (.377586) (.281822) — (2.417299) (—.010686), 


C, = .132243. 
Substituting values from the computed data, 


A — L63-160748) (2.417299) — (.377586)*] [ (.894803) (.281822)—(.010686) (.132243)], 





322705; 


2 
( 
86.906773; 
[ 


(.894803)? + (.132243)? 
—44.810802) (12.751293) +- (—33.669580) (11.936823) + 1060.210256, 
(.894803) (.281822) — (—.010686) (.132243) ]? 





A 
H 
H 
h? == 
k 


= .078600, and k = .280356; 


(.894803)* + (.132243)? 


? 


— (.488724) + (—.010686) (12.751293) + (.281822) (11 5485) 





.280356 


& = 13.256464, and &° = 175.733838. 


Wovs. May now be determined by substituting values from the computed data; 


(2.322705) (86.906773) 





Words. = 
Wopns, == 5.187352. 


Having determined w,,;., it is now possible 
to determine if a region of significance exists. 
The rule to be followed is to accept the 
hypothesis, H(x’,y’), when the observed w is 
larger than w.,, and to reject it when it is 
less than w.o,. Since 5.187352 (wWovs.) is less 
than 25.588235 (w.,), the hypothesis 
H (x’,y’) is rejected, and a region of signifi- 
cance exists at the one per cent level. There- 
fore, excellent achievers are significantly 
better than poor achievers with respect to 
ability in subtraction, when the basic match- 
ing characters, mental age and chronological 
age, are controlled statistically. 

Step Seven. Determine the shape of the 
region of significance known to exist at the 
one per cent level. The shape of this region 
is determined by comparing the value of 
Wo, k? with that of A. When w.,, &? is greater 
than A the region is a hyperbola; if w.,, k’ 
is equal to A, the region is a parabola; and 
if wo, k* is less than A, the region is an 
ellipse. The value of w.,, ? is 


(25.588235) (.078600) == 2.011235, 


and the value of A is 2.322705. Since 
2.011235 (Wo, R”) is less than 2.322705 (A) 
the region of significance is an ellipse. 


(.078600) [ (86.906773) + (175.773838) (2.322705) ] 


Step Eight. Construct a graph and locate 
upon it the coordinates of excellent and poor 
achievers in terms of mental age and chrono- 
logical age, the line of non-significance, the 
diameter of the ellipse, the center of accu- 
racy, and the ellipse, as the final step in the 
analysis. 

First, construct a graph* on which the 
abscissa (x) represents mental age, and the 
ordinate (y) represents chronological age. 

Second, locate upon this graph the coordi- 
nate for each excellent and each poor 
achiever in terms of mental age (x) and 
chronological age (¥). 

Third, locate the line of non-significance 
(on). This line is represented by the equation 
a -+ bx + cy =o. In the present analysis 
the equation is 


F(x’ ,y’) = .488724 — .010686x’ + 
.281822” = o. 


The coordinates of x and y which determine 
the location of the line of non-significance 
(On) are found by substituting known values 
of y in the equation F(x’,y’) =o. In sub- 
stituting, the values are as follows: 

* See Figure 1. 





March, 1942] 


when y= 0,%= 45-73; 
y=—I,%=— = 19.36; 
y= —2, x= — 7.01; 
y= —3, & = — 33.38. 
Since all values used in this analysis are in 
terms of code scores, and since chronological 
age and mental age in Figure 1 are in terms 
of actual scores, the code scores appear oppo- 
site their respective raw scores. Therefore, 
while the line of non-significance (oy) has 
been plotted in Figure 1 according to code 
scores, its position is also exact in relation to 
actual scores. 

Fourth, locate the diameter of the ellipse 
(0€). This line is represented by the equation 
a, + b,x + cyy = o. In the present analy- 
sis the equation is 


APPLICATION OF JOHNSON-NEYMAN TECHNIQUE 


171 


are presented opposite their respective raw 
scores. Therefore, while the diameter of the 
ellipse (0) has been plotted in Figure 1 
according to code scores, its position is also 
exact in relation to actual scores. 


Fifth, transform the coordinates, taking 
the line of non-significance (0) as the new 
axis of ordinates, and the diameter of the 
ellipse (of) as the new axis of abscissae. 
These new axes need not be at right angles 
to each other. Values measured parallel to 
o€ are positive on the side of the line of non- 
significance (0m), where the value of F is 
positive, and negative on the other side of 
the line on. The sign of €, indicates the 
direction along the line o¢ which is consid- 
ered positive or negative. 


0€ = —12.988463 + .894803x’ + .1322439’ = 0. 


The coordinates, x and y, which determine 
the diameter of the ellipse (o¢), are found by 
substituting known values of x in the fore- 
going equation. In substituting, the values 
are as follows: 


when x= 10, y= 30.55; 
%== II, Y= 23-79; 
X= 12, Y= 17.02; 
= 13, Y= 10.25. 





Sixth, locate the region of significance, the 
ellipse. The equation for the region of sig- 
nificance is 


Wo, k? & — A(E— &)* — Cn? —H =o. 


This equation is the locus of points on the 
edge of the region of significance. Computing 
the value C* and substituting values previ- 
ously computed, the equation becomes 


2.011235¢" — 2.322705 (€ — 13.256464)* — 3.188273” — 86.906773 = 0; 
—.311470€? + 61.581710€ — 495.084637 — 3.1882737* — o. 





The diameter of the ellipse (0€) always 
passes through the center of accuracy, and 





Dividing through by (—.311470), the equa- 
tion becomes 


& — 197.713134€ + 1589.509862 + 10.2362127? =o. 





the distance from the center of accuracy to 





Completing the square, the equation becomes 


(€ — 197.713134€ + 9772.620837) + 10.236212y? = 9772.620837 — 1589.509862, 
(€ — 197.713134€ + 9772.620837) + 10.236212y? = 8183.110927, 
(€ — 98.856567)? + 10.236212n* = 8183.110927. 





the point of intersection of the o€ and on 
lines is represented by the value é,, which is 
13.256464. Since all values used in this an- 
alysis are in terms of code scores, and since 
chronological age and mental age in Figure 1 
are in terms of actual scores, the code scores 


Dividing through by (8183.110927), the 
equation becomes 


(€ — 98.856567)? ”° 








8183.110927 8183.110927 a 
10.236212 


b,c — be, (.894803) (.281822) — (—.010686) (.132243) : 





C=—T5 


(—.010686)? + (.281822)? 
C = 3.188273. 





172 


(€ — 98.856567)? 
8183.110927 
Transposing, 
7 — ,_ €—98.856567)? 
(28.274152)? 8183.110927 
Extracting the square root of both sides, 


Pe an / , _ E=98.856567)* 
+28.274152 8183.110927 


v] 
= iI, 
+ 799-427657 














Multiplying both sides by (+28.274152), 


JOURNAL OF EXPERIMENTAL EDUCATION 








n= +28.274152 /— (€ — 98.856567)? 
8183.110927 


The coordinates, 7» and é, which determine 

the shape of the ellipse, are found by substi- 

tuting known values of é in the foregoing 

equation. In substituting, the values are as 

follows: 

€=I0 

é= 13 

é= 18 

é= 28 ; 17.46; 

€= 98.856867, 7 == +28.274152 (center of 
ellipse). 


The boundary of the ellipse, which repre- 
sents the region of significance, is plotted as 





[Vol. ro, No. 3 


cance, the ellipse, is plotted in Figure I. 
Above the line of non-significance the value 
is favorable to excellent achievers. This value 
is significant for that part of the population 
which is enclosed by the ellipse. Therefore, 
in general, pupils having mental ages of 109 
to 192 months and chronological ages of 130 
to 156 months, may be expected to do sig- 
nificantly better in subtraction if they are 
excellent achievers in division rather than 
poor achievers in division. 


Step Nine. Since the hypothesis H(x’,y’) 
that there is no difference between the means 
of excellent and poor achievers with respect 
to ability in subtraction when adjusted for 
variation in basic characters of matching has 
been rejected, it is also of value to determine 
from the data the best estimate they provide 
with respect to the true difference between the 
means of excellent and poor achievers. This 
estimate can be obtained by substituting in 
the equation, 


F(x’ ,y’) = .488724 — .010686x’ + .281822y’, 


the values of the coordinates of the center of 
accuracy which in this case are: 


%o == 12.751293, Yom= 11.936823. 
The equation then becomes 


F (Xo Vo) = .488724 — (.010686) (12.751293) + (.281822) (11.936823) 


F(% Yo) = 3.716529. 





follows: begin at the intersection of the lines 
on and og and measure Io positive units along 
the line o€; here € = 10, and » = +4.81. 
Now measure 4.81 units on either side of the 
line of and mark points. These points must 
be at the extremities of a line parallel to the 
line Oy. Additional points, needed for locat- 
ing the boundary of the ellipse, are deter- 
mined in the same manner by using the re- 
maining values of and ». When an adequate 
number of points has been located, connect 
the points to form the boundary of the ellipse. 
The center of the ellipse is the point at 
which the ellipse is at its widest. 

Seventh, interpret the regions of signifi- 
cance. The position of the region of signifi- 





S? 


It can, therefore, be said that the best esti- 
mate of the true difference between the 
means of excellent and poor achievers in 
subtraction, when mental age and chronolog- 
ical age are controlled statistically, is 
3.716529 in favor of excellent achievers. 
The variance of the best estimate of the 
true difference between the means of excellent 
and poor achievers may be estimated from: 


S?, F? 
” (n — $s ) = b 
All values necessary for solving this equation 
have been computed with the exception of 
S*,. The value of S*, is found by dividing F* 
by P + Q. Substituting computed values, 


V. 


[.488724 — .010686x’ + .281822y’]? 





» [coreg4x’? + 2(.000131) xy’ + .000837y’? + 2(—.105508) x’ 





+ 2(—.100652)y’ + .366918], 





March,1942] APPLICATION OF JOHNSON-NEYMAN TECHNIQUE 


Substituting the value of x, for x’ and the value of y, for y’: 
S2 [.488724 — .010686 (12.751293) + .281822 (11.936823) |? 





vine [.co1094 (12.751293)? + 2(.000131) (12.751293) (11.936823) + .000837 (11.936823)? 





+ 2(—.105508) (12.751293) + 2(—.100652) (11.936823) + .366918] ; 


S*, = 72.601539. 


Substituting computed values, the variance 
of the best estimate of the true difference 
between the means of excellent and poor 
achievers is 


V_ — £2889-504635) (13.812589) 
. (180 — 6)(72.601539) ” 
Vp = 3.159391. 





The standard error of the best estimate of 
the true difference between the means of 
excellent and poor achievers is the square 
root of Vp or 1.777468. 


SUMMARY 


In summary the special merits of this 
technique described here are: 


I, 


2. 


3- 


The experiment has the property of 
being self-contained. 

No loss of information results since all 
observational values are used. 

The populations for which it is permis- 
sible to generalize from the sample in- 
formation are rigorously defined. 


. The problem of the test of significance 


and the problem of estimation are both 
solved. 











THE APPLICABILITY OF THE SPEARMAN-BROWN FORMULA 
TO TEACHERS’ MARKS IN COLORADO STATE 
COLLEGE OF EDUCATION’ 


LORAINE BRUCE 


Pampa Senior High School 
Pampa, Texas 


1. PROBLEM 


The object of this study is to determine 
the applicability of the Spearman—Brown 
prophecy formula to college marks on the 
undergraduate and master’s levels in Colo- 
rado State College of Education. In order to 
do this, the reliability of teachers’ marks for 
two groups of students was computed: 
(1) students on the undergraduate level, and 
(2) students on the master’s level. The reli- 
ability of teachers’ marks for these two groups 
was predicted by the use of the Spearman— 
Brown formula. The correspondence between 
actual and predicted coefficients of reliability 
of teachers’ marks was determined for both 
groups. 


2. SUBJECTS 


The 209 students who first enrolled in 
Colorado State College of Education in Sep- 
tember, 1935 and in September, 1936 and 
who remained in school twelve consecutive 
quarters made up the first group for whom 
complete and comparable records of teachers’ 
marks were computed for each student. The 
second group was made up of the 230 stu- 
dents matriculating on the master’s level in 
Colorado State College of Education from 
June, 1936 to August, 1938 and who were in 
attendance a minimum of four quarters. 


3. PROCEDURES 


All records of teachers’ marks were com- 
puted in terms of point-hour ratios. The 
point-hour ratio was computed by dividing 
the total number of points by the total num- 
ber of quarter-hours. The points were found 
by multiplying the number of hours of each 
course by the point value of the letter grades. 
The point value of “A” is 5; of “B” is 4; of 
ar ae of —. 2: and . i gt 
oP RRTSLE Bx jG lene ce 


The Spearman—Brown prophecy formula,’ 
nr it 


1+ (n—1)ry’ 


was used in this study as a measure of the 
effect of cumulated marks. In this formula 
7, is the coefficient of reliability which would 
probably result from any given number of 
quarters, and ry: is the coefficient for two 
quarters. The probable error of each coeffi- 
cient estimated by this formula was computed 
by Eugene Shen’s formula,’ which is 


__ _:6745 [n(i—rn)] 
"VN [1+ (n—1)r3]? 


In this probable error formula, ry is the 
original reliability coefficient, » the number 
of times the quarters were increased, and NV 
is the number of students. 

Comparisons were made between the 
obtained and predicted coefficients of reliabil- 
ity of teachers’ marks by finding the differ- 
ences and probable errors of the differences 
between them. In the main, the formula‘ 
which was used for the probable error of the 
difference was 


PE. — +) = VPE*:, + PE*s, 


Y= 








r 





This is the probable error of the difference 
between two coefficients of correlation calcu- 
lated from independent random samples. 
Since the coefficients were not derived from 
independent samples, the probable errors 
computed for their differences were not ap- 
plicable to this study; the values of these 
probable errors were too large to be reliable. 
If the probable errors thus computed indi- 
cated significant differences between the ob- 
tained and predicted coefficients, then com- 
putation by a formula which gave a smaller 


2 Odell, C. W. Statistical Mothed in Binsetion, New York: 
D. Appleton-Century Company, Inc., 1935. 210. 
*Garrett, H. E. Statistics in Psychology ‘a or 
New York: Longmans, Green and Company, 1938. P. 316. 
4 Ibid, p. 281. 


174 








co 


—_ a 


ce it ti al i i i a 


March, 1942] 


probable error would only show additional 
assurance. 

The z-transformation was applied to the 
coefficients predicted from the correlation be- 
tween the first and third quarters of under- 
graduate work, in order to determine whether 
this computation would change the differences 
between the obtained and predicted coeffi- 
cients evaluated in terms of the short formula. 
According to Lindquist,5 this method for de- 
termining the significance of a difference is 
valid only if the coefficients are obtained 
from independent random samples. 

No test for the significance of a difference 
between obtained and predicted correlation 
coefficients has yet been devised. Such a test 
is especially needed by the research student 
in education. There is a longer formula for 
the probable error of the difference between 
two coefficients in which the two arrays are 
different, but in which the arrays are corre- 
lated with one another, which is probably 
more suited to this study. This formula® is 


APPLICABILITY OF SPEARMAN-BROWN FORMULA 


175 


and were obtained from the point-hour ratios 
for the following quarters: (1) first and third 
quarters; (2) first and second quarters; 
(3) the sum of the first and second and the 
sum of the third and fourth quarters; and 
(4) the sum of the first and third and the 
sum of the second and fourth quarters. These 
reliability coefficients were: 7,, = .664 + 
£026, yo = 6651 + .027, 7 1 + 2)¢3 +4) = +725 
tb .022, and 7,, + (2 +4) = -766 + .O1Q. 

In order to study the reliability of marks 
throughout the twelve quarters of attend- 
ance, ten other coefficients of correlation be- 
tween single quarters were computed. These 
correlations range from .478 + .036 between 
the tenth and twelfth quarters to .731 + .022 
between the fifth and sixth quarters. The 
highest correlation was found for the ratios 
of the second and third quarters of the sopho- 
more year; the next two highest were the two 
which were substituted in the Spearman-— 
Brown formula. The lowest coefficients were 
between the junior and senior years. 





PE, rs, wn VPE’,.,, . PE’... =< 27 rats 

[ (13 — 12% 23s) (F724 — Tes%sa) | 
y pam + [(%14 — 118734) (723 — 119712) J 
Tiston 


+ [ (se —Pa2% 20) (723 — Fee%s0) | 


In this formula for the probable error, r,, is 
for the obtained r and r,, for the predicted r. 
The usefulness of this formula was impaired 
by the fact that the data for the correlations 
T13) Tia) To3, and r,, were not available. The 
four unknown r’s were assumed to be equal 
to the average of the obtained and predicted 
coefficients; this is an unsupported assump- 
tion. In several instances this longer formula 
was used in order to determine its effect on 
the significance of differences obtained for 
the short formula. The differences for the 
other coefficients were either reliable by the 
short formula or were so small that the use 
of the long formula would not have shown 
them to be reliable. 


4. RESULTS 


Four product-moment coefficients of cor- 
relation for undergraduates were substituted 
in the Spearman—Brown puny formula 

5 Statistical Analysis in Educational Re- 
ton Mifflin we ksi 1940. 1900. Pe 217. 
t 


Ry may atistical Proce- 


: McGraw-Hill 
Book Company, Inc., 


PE, PE, 
4 2 34 


1 


, in which 





+ [ (tis —Prss4) (Poa — Pra r2) | [ 


1 
2(1 —r*,,)(1 nt 


Predicted r’s were found by substituting in 
the Spearman—Brown formula r,, for riz and 
two, three, four, five, and six, respectively, 
for m. Similar r’s were found by replacing r,, 
by 7,.. The coefficients, 7,, + 2) 3 +4) and 
Ts + 3)(2 +4), Were substituted in the proph- 
ecy formula and one and one-half, two, two 
and one-half, and three were successively 
substituted for m for each of these coefficients. 


The product-moment coefficients of cor- 
relation between the total point-hour ratios 
for two series of three, four, five, and six 
quarters each were computed. Each series 
was compared with the predicted coefficients 
based on the initial r’s to which each cor- 
responded. 

The results of substituting 7,, in the 
prophecy formula were equivocal. These re- 
sults are shown in Table Ia. The differences 
between the predicted coefficient, .798 + 
.019, which was the result of substituting two 
for mn, and the obtained r,, + 9), +4) = -726 
+ .022, was .072 + .029; this is not a sig- 
nificant difference. When the long formula 





176 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 10, No. 3 


TABLE I 


COMPARISONS OF OBTAINED AND PREDICTED COEFFICIENTS OF RELIABILITY OF PoINT-HourR RATIOS 
OF 209 UNDERGRADUATE STUDENTS WHO FIRST ENROLLED IN COLORADO STATE COLLEGE 
OF EDUCATION IN SEPTEMBER, 1935 AND IN SEPTEMBER, 19386, AND WHO 
WERE IN ATTENDANCE FOR TWELVE CONSECUTIVE QUARTERS 


T obt. 


1 


Diff. 
T Pred. Diff." PE dif. “PE gig 
3 4 5 6 


a. Reliabilities as Predicted from r,; = . 664 + . 026 


T(142)(8-+4) 
T(142+5)(8+4+7) 
T (1424546) (8+4+7+8) 
T(1424+5+6+-9) (8+4+7+8+11) 
T(1+24+5+6+9+10) (8+4+7+8+11+12) 
b. Reliabilities as Predicted from r,, = . 6 
T(14+8)(2+4) 
T(143+5)(2+4+6) 
T(143+5+7) (2+4+46+8) 


T(1434+5+7-+9) (2+4+6+8-+10) 


+, 


T(14+3+5+7+0+11) (24+4+6+8+10+12) 


*r pred. —T obt. 
si Resul 


«013 


. 072 . 029 
. 024) 
. 122 . 026 
—. 007 .015 47 
. 072 .018 
. 148 . 020 


2.48 
**(3.00) 
4.69 


. 028 . 82 
. 025 3. 84 
-020) **( 4.80) 
.018 2.33 
. 021 4.71 


. 016 4.12 


ts obtained from the use of the long formula. 


for the probable error of the difference be- 
tween the r’s was computed, the difference 
still did not indicate complete reliability. The 
critical ratios of the differences between the 
obtained and predicted scores for the totals 
of two quarters and of four quarters indicate 
that there is no significant difference. In 
these two cases, the correspondence is suffi- 
ciently close to assume that the prediction 
formula could be used to determine the reli- 
ability of amalgamated quarters of teachers’ 
marks. In the cases of the amalgamated 
marks of three, five, and six quarters, the 
critical ratios of four and larger indicate with 
positive assurance that the estimated reliabil- 
ities are too large. The z-transformation was 
applied to the results derived from r,,. There 
were no appreciable differences when the dif- 
ferences between the obtained and predicted 
coefficients were evaluated in terms of the 
short formula and when they were evaluated 
in terms of the z-transformation. 

When r,, was used as the reliability co- 
efficient, the critical ratios indicate practically 
the same results as when 7,, was used for the 
initial ry. These results are shown in Table Ib. 


In the case of two and four quarters, the 
prophecy formula can be used to predict re- 
liability coefficients, and in the other three 
cases, the formula overpredicts decidedly. In 
the case of three quarters, the critical ratio 
was slightly under the four required for a 
significant difference. When the long formula 
for the probable error of the difference was 
used, the critical ratio was increased to 4.80. 
This ratio indicates virtual certainty that 
the estimated reliability was overpredicted. 

The coefficients predicted from 7, + 2) (3 + 4); 
which are shown in Table IIa, indicate that 
the only significant difference was that for 
six quarters. For the total of four quarters, 
the obtained coefficient was larger than the 
predicted coefficient by an almost reliable 
difference. With 7,, as the initial r, the ob- 
tained coefficient for a total of four quarters 
was larger than the predicted reliability, but 
the difference was not reliable. When 
Tis +3)(2 +4) Was used as the initial 7, as is 
shown in Table IIb, the coefficients of the 
totals for three, five, and six quarters were 
overpredicted by reliable or almost reliable 
differences; all of these coefficients were 





March, 1942] 


APPLICABILITY OF SPEARMAN-BROWN FORMULA 


177 


TABLE II 


COMPARISONS OF OBTAINED AND PREDICTED COEFFICIENTS OF RELIABILITY OF PoINT-HourR RATIOS 
OF 209 UNDERGRADUATE STUDENTS WHO FIRST ENROLLED IN COLORADO STATE COLLEGE 
OF EDUCATION IN SEPTEMBER, 1985 AND IN SEPTEMBER, 1936, AND WHO 
WERE IN ATTENDANCE FOR TWELVE CONSECUTIVE QUARTERS 


T obt. 


1 


a. Reliabilities as Predicted from rci+2)(3+4) = .725 = .022 


T(142+5)(8+4+7) 

T (1424546) (3+4+7+8) 

T (142454649) (3-+4+4+7+8+11) 
T(14245+4+6+49+10) (3+4+7+8+11+12) 


b. Reliabilities as Predicted from r(i+3)(2+4) = . 


T(148+5)(2+4+6) 

T(148+5+7) (24+4+6+8) 

T(1434+5+7+49) (24+4+6+8+10) 

T(143+5+7+0-+11) (24+4+6+8+10+12) 
*T pred. —T obt 


‘ : . 057 . 016 
. 018 ‘ **( 012) 


; Diff. 
I Pred. _ Diff.* PE diff. “pr aif, 
3 4 5 6 
. 065 
—. 054 


. 028 2. 32 


=, 


3.18 
**( 3.86) 
1.65 


-021 5. 19 


. 025 
. 019) 
. 018 


. 020 


3.16 
**( 4. 16) 
1.56 
4.35 


3. 56 
**( 4. 75) 


** Results obtained from the use of the long formula. 


overpredicted by reliable differences when 
the long formula was used. 


The divergence from normality was meas- 
ured for all the distributions involved in 
computing the four reliability coefficients for 
the undergraduate ratios; the same measures 
were made for two of the distributions which 
were made up of totals of six quarters. With 
the exception of the kurtosis of the distribu- 
tion of the first quarter and both the skew- 
ness and kurtosis of the distribution of the 
third quarter, the divergence of the distribu- 
tions from the ideal normal curve is the result 
of chance fluctuations. Because of departure 


of the distributions from normality, the ob- 
tained and predicted coefficients based on 
43, and to some extent those based on 72, 
yield less reliable results than coefficients 
based on more symmetrical distributions. 

On the master’s level, the coefficients of 
reliability were computed between the point- 
hour ratios for the following quarters of work: 
the first and third quarters, 7,, = .393 + 
.038, and the first two quarters, 7,, = .448 
+ .036. These two coefficients were substi- 
tuted in the prophecy formula and two was 
substituted for n. The differences between the 
obtained and predicted coefficients are not 
reliable. The results are shown in Table III. 


TABLE III 


COMPARISONS OF OBTAINED AND PREDICTED COEFFICIENT OF RELIABILITY OF PoINT-HoUR RATIOS 
OF 230 STUDENTS ON THE MASTER’s LEVEL AT COLORADO STATE COLLEGE OF EDUCATION 


T obt. 


1 
a. Reliability as Predicted from r,; = . 398 = .0 


T(142)(8+4) 


Diff. 
TPred. Diff." PE dif. “PE gig 
3 4 5 6 


38 


. 461 
+. 035 


564 . 103 


‘ 1.98 
=. 039 


b. Reliability as Predicted from r,, = .448 + . 036 


T(148)(2+4) 


* pred. —T obt. 


.619 
=. 034 


1.40 





178 


5. CONCLUSIONS 


According to the results of this study, the 
reliability of teachers’ marks, excepting those 
for the amalgamated ratios of two and four 
quarters, cannot be successfully predicted by 
the use of the Spearman—Brown prophecy 
formula. Evidently teacher’s marks do not 
satisfy the assumptions upon which the 
prophecy formula is based as to comparable 
and similar forms of measures, and conse- 
quently its application is of doubtful validity. 

The results indicate that reliability coeffi- 
cients of amalgamated quarters of marks may 
be predicted for totals of two and four quar- 
ters; the use of any other totals would result 
in overprediction. 

However, the results obtained in this in- 
vestigation should be accepted with some 
reservations because: (1) the students were 
not a random sample of college students; 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. ro, No. 3 


(2) as the period of time during which the 
point-hour ratios were obtained extended over 
four years, the changes in the student may 
have rendered the use of the Spearman— 
Brown formula inappropriate; (3) several 
distributions of point-hour ratios departed 
more widely from normality than could be 
accounted for by change; (4) no adequate 
formula for computing the probable error of 
the difference between obtained and predicted 
coefficients was available for use with the 
data of this study. It is highly probable that 
if such a formula could have been applied, a 
larger number of reliable differences would 
have been found. On account of different 
grading systems and methods of determining 
marks in different educational institutions, it 
is not likely that a random sample of college 
students could have been used satisfactorily 
in such an investigation. 





THE RELATION OF PRIMARY MENTAL ABILITIES TO 
SCHOLASTIC SUCCESS IN PROFESSIONAL SCHOOLS 


Dewey B. Stuit and Harry H. Hupson 
University of Iowa 


The present study is a part of a larger 
primary abilities testing program sponsored 
by the American Council on Education in ten 
universities in the spring of 1939. The pri- 
mary purpose of that investigation was to 
determine the relation of primary ability test 
scores to vocational choices. The chief out- 
come of the study was the publication by 
Adkins’ of vocational group profiles on the 
primary mental abilities for professional stu- 
dents in 12 occupational fields. No attempt 
was made to study the relationship between 
performance in the tests and academic 
achievement as measured by grades. The 
major purpose of the present study is to 
present data bearing on the latter point and 
also some supplementary evidence on the 
group profiles. 


The University of Iowa’s contribution to 
the American Council on Education study 
was to administer the A. C. E. Tests for Pri- 
mary Mental Abilities? to students in engi- 
neering, medicine, and journalism. The 
engineering group was composed of 27 stu- 
dents, and the journalism and medical groups 
were each composed of 29 students, making 
a total of 85 subjects for the study. The 
engineering and journalism grade point aver- 
ages® were based on seven semesters of work 
and the medical averages were compiled from 
five semesters of work. All zero order cor- 
relations were computed by the Pearson 
product-moment method and all multiple 
correlations by the Wherry—Doolittle proce- 
dure. The significance of correlation coeffi- 
cients was determined by use of the table of 
values required for significance as presented 
by Lindquist.* 

2 Adkins, Dorothy C. “The Relation of Primary Mental 


Abilities to Vocational Choice.’’ American ene on Educa- 
tion Studies, Series 5, Vol. 4, No. 2. Pp. 39-53. 


2For a ty aga ig P 4 tests and the factors measured 


see Thurstone. . Manual of Instructions: Tests for 
Primary Mental Abilities. Washington, : 
Comef on Education, 1938. Pp. 12. 

% The & 5 Cc, D, & podies system pone. In com ~~ 
grade int averages, t ow: weights were emplo' 
AA, B=3, C=2, D=1, E-0 = ¥ $e 


* Lindquist, E. F. Statistical Analysis in Educational Re- 
search. New York: Houghton Mifflin, 1940. P. 212. 


American 


The correlations between the factor scores 
and grade point averages are presented in 
Table I. Significant correlations at the 5 per- 
cent level were found between the Verbal, 
Memory, Induction, and Deduction factors 
and engineering grade point averages. For 
the journalism group, significant correlations 
were found between the Perception and 
Verbal factors and the criterion. No signifi- 
cant correlations were found between the 
factor scores and the criterion for the med- 
icine group. 

The correlations presented in Table I may 
seem somewhat confusing and illogical. For 
example, one might have expected the Spatial 
factor to correlate more highly with scholastic 
success in engineering and the Memory factor 
to correlate more highly with grades in med- 
icine. In the interpretation of these data two 
important facts should be remembered. 
First, the groups are very small; consequently 
only limited confidence can be placed in the 
correlations. Second, since the members of 
each group are advanced students who have 
done reasonably satisfactory work, all may 
have sufficient amounts of the vitally impor- 
tant factors to insure their success, and con- 
sequently individual differences may not be 
associated with variations in achievement. 
There is also the possibility, of course, that 
defects in the tests or the criteria of scholastic 
success may be responsible for the low cor- 
relations. 

Using the Wherry—Doolittle test selection 
method a multiple correlation of .614 was 
found between the criterion and the factor 
scores for the engineering group. Included in 
the selection was the Verbal factor, which 
correlated .577 with the criterion, and the 
Memory factor, which increased the multiple 
coefficient to .614. Additional factors added 
more chance error than they did validity to 
this coefficient. 

For the journalism group, it was found 
that the Verbal factor correlated .505 with 
the criterion, the Memory factor increased 
the multiple coefficient to .523, the Perception 


179 





JOURNAL OF EXPERIMENTAL EDUCATION {Vol. ro, No. 3 


TABLE I 


CORRELATIONS BETWEEN FACTOR SCORES AND GRADE POINT AVERAGES IN 
THREE PROFESSIONAL SCHOOL GROUPS 


Group  s 


Engineering 150 
Journalism 427 
Medicine 353 


V=Verbal 
S=Spatial 











P=Perception 
N=Number 


factor to .567, and the Deduction factor to 
-594. Additional factors added no validity to 
this multiple coefficient of correlation of .594 
between the factors and the criterion. 

A multiple coefficient of correlation of .416 
was found between the factors and the crite- 
rion for the medicine group. The Perception 
factor correlated .353 with the criterion, the 
Induction factor increased the correlation to 
.398, and the inclusion of the Deduction 
factor in the selection yielded the maximum 
multiple coefficient of .416, additional factors 
adding no validity. 

These multiple correlations suggest that 
tests of primary mental abilities possess some 
promise in the prediction of scholastic suc- 
cess in professional schools. When it is re- 
membered that all of the students included 
were nearing the completion of their work 
and thus constituted a homogeneous popula- 
tion, it seems a bit surprising that the corre- 
lations should be as high as they have been 
found to be in this investigation. 

To a vocational counselor the group pro- 
files are of as much interest as the correla- 
tions between scores in the tests and criteria 
of success. These profiles for engineering, 
journalism, and medicine are presented in 
Figures 1, 2, and 3 respectively. The vertical 
axis of the profiles represents a standard 
score scale, based on the performance of Uni- 
versity of Chicago students. A standard score 
of 1.00, for example, represents a performance 
equal to that of University of Chicago stu- 
dents one standard deviation above the mean, 
or the equivalent of the 84th percentile. 

The engineering group showed the highest 
single average factor score on the Deduction 
factor, with the Spatial, Number, Perception, 
and Induction factors being slightly below 
this peak, and the Verbal and Memory factors 
being only slightly above the average scores 
made by the University of Chicago freshmen 
on these factor tests. This is an important 


228 
318 
172 


M=Memory 
I=Induction 


Factor 
Vv Ss M I 


OTT 178 563 400 
-505 015 232 3821 
151 098 —013 —.219 


D=Deduction 


point for consideration, since, it will be re- 
called, the Verbal and Memory factors cor- 
related highest with the criterion for this 
group. It should be noted that, although ab- 
solute scores may be high or low, the correla- 
tion does not necessarily correspond in rela- 
tive magnitude, due to the highly selective 
nature of the group. It appears that the 
Verbal and Memory factors are the most pre- 
dictive of academic success in engineering, 
since they correlate highest with the grade 
point average; but, once the professional 
group has been selected, the abilities repre- 
sented by these factors contribute little in 
the discrimination of the group from the pop- 
ulation average. This fact is important in the 
practice of educational as well as of voca- 
tional counseling. 


The journalism group showed the highest 
single average factor score on the Perception 
factor, with the Verbal and Number factors 
being slightly below this peak. The Deduc- 
tion, Memory, Spatial, and Induction factors 
were slightly below the normative scores for 
these factors. It is significant to note that 
high achievement in those abilities represented 
by the Deduction, Memory, Spatial, and In- 
duction factors is apparently not essential 
for professional. success in this field, since 
scores for this professional group all fell be- 
low the average scores of University of Chi- 
cago students. The implications of this find- 
ing for counseling lie in the fact that low 
scores on such abilities would not eliminate 
the field as a vocational possibility if other 
qualities pertinent were present. It should be 
noted that the deviations for these factors 
are not as great as is the deviation above the 
norm for the Perception factor. 


The item of greatest significance about the 
medical group profile is the fact that all of 
the factor scores are well above the average 
for a normative population. The Perception 
factor appears to have the highest discrim- 





March, 1942] PRIMARY MENTAL ABILITIES 


GROUP PROFILES ON PRIMARY MENTAL ABILITIES 


iY v M 












































Figure 1, Engineering 


vy M 


























Figure 2. Journalism 


Vv 














Figure 3. Medicine 





182 


inative value, followed by the Spatial and 
Number factors. This may be interpreted as 
indicating that the medical profession requires 
a rating above the average on each of the 
factors, with an emphasis on the Perception 
factor. It is interesting to observe that the 
Induction factor was above the average score 
for a normal population, although there was 
found a slightly negative correlation between 
this factor score and the criterion. 

For the vocational counselor and the per- 
sonnel technician, the discovery of charac- 
teristic profiles for various professional groups 
is a significant finding. It is important to 
discover that the factor scores based upon 
normative data for various groups show char- 
acteristic patterns, differentiating the profes- 
sional groups from the average population in 
terms of the particular abilities required for 
success in the various fields. This fact sup- 
ports the contention that unitary measures of 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. ro, No. 3 


intelligence are not sufficient alone to char- 
acterize the mental ability requirements for 
a professional group. After a minimum level 
of mental ability has been established, 
measures other than the total scores on intel- 
ligence tests are necessary for discrimination 
between the individuals in the field. 


The results of this study, as well as those 
reported by other investigators, suggest that 
tests for primary mental abilities will have 
some value in educational and vocational 
counseling. Whether they will replace present 
vocational aptitude tests remains a problem 
for further research. No doubt many of these 
aptitude tests measure the same or nearly the 
same fundamental abilities. If these abilities 
can be identified by the technique of factor 
analysis, and tests constructed to measure 
them, a real service will have been rendered 
to scientific vocational guidance. 





RELATIVE DIFFICULTY OF TEST ITEMS OF THE REVISED 
STANFORD-BINET: AN ANALYSIS OF RECORDS 
FROM A LOW INTELLIGENCE GROUP 


ARTHUR L. RAUTMAN 


Northern Wisconsin Colony and Training School 
Chippewa Falls, Wisconsin 


A. INTRODUCTION 


Because workers in the field of mental 
measurement have long recognized the fact 
that an individual’s behavior during a psy- 
chometric examination furnishes more infor- 
mation concerning the examinee’s abilities 
and his capacity for future development than 
can be expressed by a single numerical sym- 
bol such as the mental age rating or the in- 
telligence quotient, attempts have been made 
to interpret test performance more compre- 
hensively. Before a complete and qualitative 
evaluation of performance on a psychometric 
examination is possible, however, there must 
be available to the psychologist certain de- 
tails relating both to the examinee and to the 
test, including, among other factors, informa- 


tion concerning the relative difficulty of the 
individual test items. For the purpose of 


comparing the individual items, research 
workers have subjected the tests of the 
Stanford-Binet examination to various spe- 
cial types of analysis. Growdon (2) has at- 
tempted to re-evaluate the test items of the 
Stanford-Binet in order to convert this test 
into a point scale of performance. A qualita- 
tive and quantitative study of the responses 
to individual test items of this same examina- 
tion was made by Martinson and Strauss (3) 
and Strauss and Werner (4) who compared 
the responses of normal children with those 
of mentally defective children. Gillette (1) 
has further attempted an item analysis of the 
test responses of a group of children brought 
to a guidance clinic in order to determine the 
relative difficulty of tests within year levels 
six through twelve to enable her to gain a 
more comprehensive understanding of test 
performance. Since work with special groups 
such as the mentally defective has seemed to 
show that certain test items may be unsuit- 
able under actual clinical conditions, we have 
attempted to determine by means of a sta- 
tistical analysis the relative difficulty of tests 


within each year level of the Revised 
Stanford-Binet, Form L, from Year II 
through Year XI for a group of subjects 
having a low intelligence rating. This paper 
summarizes certain results gained from this 
analysis. 

The data are based upon the test records 
of patients who were examined by the psy- 
chologist at a state institution for the care 
and training of mental defectives. All exam- 
inations were conducted by the writer, and 
no records of patients with special patholog- 
ical conditions and handicaps are included 
since the Stanford-Binet test is given only 
to patients who are able to give a reliable 
response on this type of examination. 

The study includes data from one thousand 
consecutively examined patients having an 
intelligence quotient rating of less than 80. 
The range in intelligence quotient was from 
13 through 79, and the mean intelligence 
quotient for the group was 49.75, sigma 
15.27. In mental age the group ranged from 
2 years to 11 years, 10 months, the average 
mental age being 6 years, 9.3 months (sigma 
2 years, 4.9 months). The range in chrono- 
logical age was high, from 3 years to 81 
years, and the average chronological age for 
the group was 19.63 years (sigma 9.67 
years). 

Since the data studied are drawn from a 
special group such as one would ordinarily 
find in an institution for mental deéfectives, 
the findings of this study will necessarily be 
limited by this fact. However, although the 
group surveyed is not representative of the 
population in general, it probably is typical 
of the inmate populations usually found in 
public and private institutions for mentally 
defective patients. 


B. PROCEDURE 


In this study we have been concerned with 
the relative difficulty of the various tests 
within each particular year level and have 


183 








184 JOURNAL OF EXPERIMENTAL EDUCATION 


made no attempt to evaluate the placement 
of the items at a specific level. The relative 
difficulty of the tests within a year level was 
determined by calculating and comparing the 
percentage of individuals who were able to 
pass each test. In this manner the relative 
difficulty of the tests at each year level from 
Year II through Year XI was determined for 
the group having the same mental age as the 
test level under consideration; these groups 
are called the “Mental Age groups.” In addi- 
tion, the relative difficulty of the tests at each 
of these year levels was determined for the 
“Total group;” i.e., for the group consisting 
of all individuals to whom a particular test 
year had been given. 

After the relative difficulty of the tests 
within a year level had thus been established 
by comparing the percentage passing each 
test within a given test year, the reliability 
of the differences between these difficulty 
values (percentages passing) was determined. 
Since within each year level the percentages 
for the various tests were obtained on the 
same group of subjects, the standard error of 
the differences between two percentages was 
calculated by means of the complete formula: 

o diff. = Vo"?p, + op, — 27 p,p,. ocp,op, 

For both the Mental Age and the Total 
groups the data for each year are presented 
in tabular form, the tests in each case being 
identified by name and serial number, and 
listed in order of difficulty. The percentage 
of individuals passing each test is indicated, 
and there is also shown the serial number of 
each test from which a specific test is reliably 
different; that is, cases in which the critical 
ratio between the two tests was found to be 
3 or more. 

Since psychometric work with special 
groups like the mentally defective has often 
seemed to indicate that performance on cer- 
tain parts of the Stanford—Binet test is more 
affected by life and school experiences than 
is performance on other test items, an 
attempt was made to study the influence of 
life age upon individual tests at each of the 
year levels included in this study. For this 
purpose it was necessary to compare, at each 
year level, the performance of groups of 
patients having approximately the same 
mental age but differing in chronological age. 
Such groups were obtained by selecting all 
individuals with a mental age corresponding 
in number to the particular year being an- 





[Vol. ro, No. 3 


alyzed, and, in addition, those whose mental 
age is included within the range from six 
months below to six months above this level. 
Within each of these selected groups the test 
records were arranged according to the ex- 
aminees’ chronological ages, and then divided 
on the basis of chronological age into three 
sub-groups, a “younger,” a “middle,” and an 
“older” group. 

For each of these three age groups the per- 
centage of individuals passing each test was 
determined and the reliability of the differ- 
ences in relative difficulty between the groups 
was calculated. Since these data are drawn 
from separate groups and hence the correla- 
tions between them may be assumed to be 
zero, the following formula was employed in 
calculating the critical ratios: 


o diff. — \/o"p, + 0°, 


The data showing the percentage of indi- 
viduals passing each test in the three age 
groupings for each year level are presented 
in table form. These tables also show the 
mental age range included in the year level 
studied, and, in addition, the range and the 
mean for both the chronological age and the 
intelligence quotient for each of the three age 
groups under comparison. 


C. RESULTS 
1. Year II-o 


The rank order of the tests for the Mental 
Age group and for the Total group at year 
level II-o is presented in Table 1. In the 
Mental Age group the percentage of exam- 
inees passing the various test items ranges 
from 94% for the form board to 75% for 
identifying parts of body. Due to the rela- 
tively small number of individuals involved, 
however, none of these differences reach sta- 
tistical significance. 

For the Total group; i.e., for all individ- 
uals who took the tests at this level, the 
percentage passing a test ranged from 99% 
for the form board to 89% for identifying 
parts of body. As is indicated in the extreme 
right column of Table 1, the percentages 
passing form board, 99%, and word combi- 
nation, 99%, are each reliably different from 
the percentages passing identifying objects by 
name, 91%, and identifying parts of body, 
So, the two most difficult tests at this year 
level. 





o> Ss ie 


of 4.HM @& to aS | 


~~ a FES a 


SY 8 8 fee te eee eC 


tt aeiewwe't = 


ae Sere Tw OV VO 





March, 1942) STANFORD-BINET TEST ITEMS 


TABLE 1 
YEAR II-0: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 
MENTAL AGE GROUP TOTAL GROUP 
N=16 N =105 
TEST % Reliably TEST % 
Pass- diff. from Pass- 
No Name ing Test No. No. Name ing 
1 Form board......... 94 1 Form board___..___- 99 
4 Block tower_____--- 94 6 Word comb._______- 99 
6 Word comb._- __---- 94 4 Block tower______-- 96 
S a Wee 4. cscs 81 2 *{ a 93 
2 Ident. by name__---- 75 2 Ident. by name----_- 91 
3 Parts of body-....-. 75 8 Parts of body_____-- 89 
TABLE 2 


185 


Reliably 
diff. from 
Test No. 
2,3 

2,3 


— 
ss 
Qa 


Year IIJ-0: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 2-0 Through 2-11 


Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 2 3 4 5 6 
Younger... 22 3-9 to 8-7 5.8 23-76 465.5 95 91 91 100 
Middle__.. 22 8-9 to 13-8 10.9 16-33 28.0 100 95 91 95 
Older...... 23 14-0 to 73-9 20.5 13-20 17.5 100 83 74 95 


86 95 
91 100 
86 95 


A group was then selected with a limited of examinees passing the various tests for 
mental age range, from 2 years through 2 these age groups show no statistically signifi- 
years, 11 months. There were 67 individuals cant differences. On the tests at this year 
within this mental age range who had been level, life age, experiences, and schooling 
given the Year II-o tests. As described apparently have little effect upon perform- 


above, the records of these individuals, thus ance. 
selected for comparable mental ability, were 
divided on the basis of chronological age into 


2. Year II-6 


three groups, a “younger,” a “middle,” and The tests at the II-6 year level, arranged 
an “older” group (Table 2). The percentages _ in order of difficulty, are presented in Table 3. 


TABLE 3 


YEAR II-6: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 


aban | GROUP 


TEST % Reliably 

Pass- diff. from 
No Name ing Test No. No, 
6 Form board: rot..... 78 1,2,4 5 
3 Naming obj._-_------ 72 4 3 
5 Two > sh sr 70 6 
2 Parts of body---_---- 63 6 2 
1 See er Oe... 5 ss 61 6 4 
a. Sa 56 3, 6 1 
TABLE 4 


TOTAL GROUP 
N =162 

TEST % 
Pass- 

Name ing 

77) Gee. ....=..-. 80 

Naming obj.____-_-- 79 

Form board: rot..... 77 

Parts of body------- 77 

,. =o eee 75 

IdentJby use-_----_- 74 


Reliably 
diff. from 
Test No. 


1 


Year II-6: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 2-0 Through 3-5 


Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 2 3 4 5 6 


Younger... 35 3-9 to 8-9 5.8 23-76 48. 
Middle Vater 8-9 to 14-0 10.8 16-88 25. 
Older...... 84 14-7 to 73-9 26.3 13-26 19. 


5 71 83 71 60 
1 59 62 76 76 
3 59 59 73 61 


88 69 
76 71 
61 79 





186 JOURNAL OF EXPERIMENTAL EDUCATION 


The actual difficulty values and the serial 
numbers of the items from which each test 
differs significantly can be obtained by in- 
spection of the table. Identifying objects by 
use and the picture vocabulary appear to be 
the most difficult tests at this year level. 

For the different age levels (Table 4), 
identifying parts of body (Test 2) again 
appears to be easier for the younger as com- 
pared with the older group, difference 24%, 
critical ratio 2.27. A similar difference was 
also found for this same test as it appears on 
the II-o level, 17%, critical ratio 1.55. The 
younger group also appears to do slightly 
better than the older group on repeating two 
digits (Test 5). However, none of the differ- 
ences between the three age groups at this 
level approach statistical significance. 


3. Year III-o 


At the III-o year level the percentages 
passing each test show marked consistency 
(Table 5). In the case of both the Mental 
Age and Total groups, drawing a circle and 
stringing beads are easier than repeating 
three digits and picture memories, with dif- 
ferences that are clearly reliable. 

For the three age levels, repeating three 
digits (Test 6) is easiest for the younger 
group and most difficult for the middle group 
(Table 6). The difference between the 


[Vol. 10, No. 3 


younger and middle groups on this test, 37%, 
critical ratio 3.70, is the only difference at 
this year level within the range of statistical 
reliability. However, the difference on this 
same test between the older and younger 
groups is 25%, and a critical ratio of 2.44 
shows that there are 99 chances in 100 that 
this is a true difference. 


4. Year ITI-6 


The percentage passing each test at the 
III-6 year level is shown in Table 7. The 
only inconsistency in the order of difficulty is 
that picture vocabulary and identifying 
objects by use are rated fourth and fifth in 
order of difficulty for the Mental Age group 
and fifth and fourth respectively for the 
Total group. For the Mental Age and the 
Total groups, pictures I is reliably different 
from all other test items. For a low mentality 
group this item appears to be too difficult at 
this level to have much discriminative value, 
since only 10% of the Mental Age group and 
23% of the Total group were able to pass 
the test. For both groups, simple commands, 
comparison of sticks, and comprehension I 
are the easiest tests in the order named. 


At this level (Table 8) the only difference 
between the three age groups which ap- 
proaches statistical significance is comprehen- 
sion I (Test 6), the differences between the 


TABLE 5 


Year III-O0: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 


MENTAL AGE GROUP a = 
TEST % Reliably TEST % Reliably 
Pass- _ diff. from Pass- diff. from 
No. Name ing Test No. No. Name ing Test No. 
7 ee 91 3, 4, 6 SS eee 92 2, 3, 4, 6 
1 Stringing beads_-_-_-_- 88 4,6 1 Stringing beads__-_-_- 91 2, 3, 4, 6 
— aaa 77 4,6 3 Block bridge_______- 82 1, 2, 4, 5, 6 
8 Block bridge_-__-_- oy ae 4,5 _ (eee 67 1, 3, 4, 5, 6 
6 Three digits.______- 54 1, 2, 6 Three digits........ 61 1, 2, 8, 4, 5 
i. SO” ee 40 1, 2, 3, 5 a Fees Se... 2. -... 56 1,2,3, 6,6 
TABLE 6 
Year III-0: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 2-6 Through 3-11 
Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 2 3 4 5 6 


Younger... 43 
Middle_... 43 


3-9 to 9-0 6.3 31-76 
9-0 to16-10 11.6 17-43 
Geaer...... 48 17-2 to 73-9 26.5 17-26 


47.4 93 53 88 42 86 72 
27.3 90 59 79 53 93 35 
22.3 98 67 77 51 95 47 











CO wre 























March, 1942) STANFORD-BINET TEST ITEMS 187 





TABLE 7 
YEAR III-6: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 
MENTAL AGE GROUP TOTe. ar 
TEST % Reliably TEST % Reliably 
Pass- _ diff. from Pass- diff. from 
No. Name ing Test No. No. Name ing Test No. 
1 Simple commands... 70 4 1 Simple commands... 63 2, 3, 4, 5, 6 
38 Compar. sticks___-__- 58 4 38 Compar. sticks_-____ 58 1, 2, 4, & 
S Gee o........-.. 58 4 © Gee o.......... 57 1, 2,4 
as. , saree 56 4 5 Ident. by use______- 54 1, 3,4 
5 Ident. by use______- 54 4 if. = aaa 51 1, 3, 4, 6 
Gu Wek cecccas 10 1, 2, 3, 5, 6 ee °° ~ oa eae 23 1, 2, 3, 5, 6 
TABLE 8 
YEAR III-6: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 3-0 Through 4-5 
Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 2 3 4 5 6 
youer- is 4-6 to 10-1 7.0 32-73 650.4 82 47 55 11 55 42 
Middle___. 37 10-6 to 18-5 14.1 23-39 28.6 70 61 59 28 62 72 
Older___._- 88 18-6to58-11 31.0 20-30 25.6 63 61 68 29 55 71 
older and younger groups and between the 5. Year IV-o 


middle and younger groups being 29% and Table 9 shows the percentage passing the 
30% respectively. There are approximately  yarious tests for the IV-o year level. Both 
99 chances per hundred that these represent the Mental Age and the Total groups find the 
true differences. Apparently increased life picture completion of a man: 1 point the 
age and experience give an examinee a slight most difficult; the differences between this 


advantage on this test item. and all other tests for this age level are sta- 
TABLE 9 
YEAR IV-0: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 
MENTAL AGE GROUP TOTAL GROUP 
N =35 N =271 
TEST % Reliably TEST % Reliably 
Pass- _ diff. from Pass- diff. from 
No. Name ing Test No. No. Name ing Test No. 
., eee 3 2 Obj. from mem..----- 71 1, 3, 4,5 
2 Obj. from mem._---_- 77 3 @ See Be... .. 70 1, 3, 5 
4 Paes. emt... ......- 77 3 S 2a eee... .....-.. 67 1, 2, 3, 5 
5 2a 71 3 SS a 63 2, 3, 4, 6 
6 Compre. II_-.._..-- 59 eS“ 2, 3, 4,6 
3 Pict. compl.:man_._. 48 1, 2, 4,5 8 Pict. compl.:man__. 57 1, 2, 4, 5, 6 
TABLE 10 


YEAR IV-0: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 3-6 Through 4-11 


Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 2 3 4 5 6 
Younger... 43 5-5 to 12-9 8.7 28-73 46.8 49 86 53 51 79 87 
Middle.... 42 12-11to21-1 16.6 24-35 29.2 73 78 57 86 71 61 
Older... _- 42 21-2to54-11 30.1 69 71 55 76 60 67 

















188 JOURNAL OF EXPERIMENTAL EDUCATION 


tistically significant for both groups with the 
single exception of comprehension II for the 
Mental Age group. 

A greater percentage of the older group 
than of the younger group (Table 10) suc- 
ceeds on picture identification (Test 4) and 
also on comprehension II (Test 6). The dif- 
ference between the younger and older groups 
for the picture identification test is 25%, 
critical ratio 2.48, and that between the 
younger and middle groups is 35%, critical 
ratio 3.76. On the comprehension II test, the 
difference between the younger and older 
groups is 30%, the critical ratio showing that 
there are more than 99 chances in 100 that 
this difference is reliable. 

The older and middle groups also seem to 
have a slight advantage over the younger 
group on picture vocabulary (Test 1), and, 
although the differences are not statistically 
significant, there are, nevertheless, approxi- 
mately 98 chances in 100 that the obtained 
difference is reliable. 


6. Year IV-6 


At the IV-6 year level, test number 5, 
three commissions, is omitted and the Alter- 
nate test, picture identification, is substituted. 
The percentage passing for each of the tests 
on this age level is given in Table 11. For 
both the Mental Age and the Total groups, 


[Vol. 10, No. 3 


aesthetic appreciation and picture identifica- 
tion are reliably easier than all of the other 
tests. Opposite analogies appears to be the 
most difficult item in this group of tests and 
is reliably different from the other tests. 

In comparing the performance of the three 
age groups, we find that picture comparison 
(Test 3) is the only test for which a statis- 
tically reliable difference can be demonstrated 
(Table 12). On this item 37% more of the 
younger group than of the older group re- 
ceived a passing grade, the middle group 
coming between the two. 


7. Year V 


The tests of the V year level, listed in 
order of difficulty for both the Mental Age 
and Total groups, are given in Table 13. In 
both groups the square, folding a triangle, 
and definitions are reliably less difficult than 
picture completion of a man: 2 points and 
memory for sentences II. 


At this year level (Table 14) none of the 
differences between the various age groups 
can be considered completely reliable. The 
middle group did better than the older on the 
picture completion of a man (Test 1), while 
the older group seemed to have an advantage 
over the younger on definitions (Test 3), the 
critical ratios being 2.49 and 2.75 respec- 
tively. 


TABLE 11 


YEAR IV-6: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 


MENTAL AGE GROUP TOTAL GROUP 
N=49 N =298 
TEST % Reliably TEST % Reliably 
Pass- _ diff. from Pass- iff. from 
No. Name ing Test No No. Name ing Test No. 
1 Aesth. comp._....._. 78 2, 3, 4, 6 1 Aesth. comp........- 68 2, 3, 4, 6 
fee. sees... ......- 78 2, 8, 4,6 A 2. See. . ..-...- 66 2, 3, 4, 6 
S. eee. ........ FF 1,6, A S pee ele... .....- 53 1,6,A 
3 Pict. compar.-______- 55 1,6,A 2 Four digits.._...... 51 1,6,A 
2 Four digits._______- 47 1,6, A > Sa 50 1,6, A 
S Gen eae F........ 27 L62.64 6 Gee eeee........ 41 1, 3,3,4,A 
TABLE 12 


YEAR IV-6: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 4-0 Through 5-5 
Intelligence 


Chronological Age 

Group N Range Ave. Range 
Younger__. 47 
ee 48 


22-3 to 54-11 30.3 








Quotient 


5-6 to 14-6 10.0 31-73 
Middle_._.. 47 14-6 to 22-3 18.6 a oe 





PERCENTAGE PASSING TEST 
Ave. 1 2 3 4 6 Alt. 


50.0 79 45 74 51 43 70 
32.3 68 49 57 49 29 70 
82.2 73 54 37 69 87 87 








0} 


—p wi =e ws 




























March, 1942] STANFORD-BINET TEST ITEMS 189 


TABLE 13 
YEAR V: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 


MENTAL AGE GROUP TOTAL GROUP 
N=145 , N = 462 
TEST % Reliably TEST % Reliably 
Pass- iff. from Pass- diff. from 
No. Name ing Test No. No. Name ing Test No. 
© es on. 5. cud 9 1, 5, 6 2 Folding triangle-_-__- 88 1, 3, 5, 6 
2 Folding triangle-__---. 93 1,5 et *>))ee 87 1, 5, 6 
3 Definitions________- 93 1,5 3 Definitions. _______- 86 1, 2, 5, 6 
6 Counting obj.__-_---- 88 1, 4,5 6 Counting obj._..____ 81 1, 2, 3, 4, 5 
1 Pict. compl.: man (2) 81 2, 3, 4, 5, 6 5 Sent.mem.II_...... 57 1, 2, 3, 4, 6 
5 Sent. mem. IT______- 56 1, 2, 3, 4, 6 1 Pict. compl.: man (2) 652 2, 3, 4, 5, 5, 
TABLE 14 
YEAR V: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 4-6 Through 6-5 
Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 2 3 4 5 6 


Moe gd aa ae 8-1 to 14-7 11.0 82-72 60.2 81 92 88 96 55 91 
Middle._.. 95 14-8 to21-7 17.1 80-45 38.4 88 90 95 94 62 85 
ee 94 21-10to57-0 28.9 80-43 = 37.1 74 91 98 97 65 86 


8. Year VI group, the difference between vocabulary and 
Because of the large number of cases upon 22e; and for the Total group, the difference 
which the data of the following year levels between numbers and mutilated pictures, and 
are based, even small differences in relative 80 that between maze and picture com- 
difficulty are statistically reliable. At the VI 22750”. 
year level in both the Mental Age and the On vocabulary (Test 1) 28% more of the 
Total groups (Table 15), all tests show reli- older group than of the younger group re- 
ably different levels of difficulty with the fol- ceived a passing score (critical ratio 5.04), 
lowing three exceptions: for the Mental Age with the percentage for the middle group 


TABLE 15 
YEAR VI: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 





MENTAL AGE GROUP TOTAL GROUP 
N =164 N=711 
TEST % Reliably TEST % Reliably 
Pass- diff. from Pass- _ diff. from 
No. Name ing Test No. No. Name ing Test No. 
S Bat pict. ......... 96 1, 2, 4, 5, 6 4 Number concepts.... 76 1, 2, 5, 6 
4: Number concepts.... 89 1, 2, 3, 5, 6 S tea eee.......... % 1, 2, 5,6 
1 Voenbulery......... 82 2, 3, 4, 5 1 Vocabulary.___.._-- 68 2, 3, 4, 5, 6 
6 SS Se 79 2, 3, 4, 5 i °° iia 66 1, 2, 3,4 
5 Pict. compar._--_-_--- 73 1, 2, 3, 4, 6 5 Pict. compar..___--- 66 1, 2, 3,4 
2 Bead chain I_.....-- 66 1, 3, 4, 5, 6 2 Bead chain I_______- 63 1, 3, 4, 5, 6 
TABLE 16 
YEAR VI: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 5-6 Through 7-5 
Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 2 3 4 5 6 


Younger... 109 8& 8 to 14-8 11.4 39-76 54.7 61 86 96 91 88 79 
Middle._.. 109 149to20-10 17.1 87-51 43.4 76 68 94 85 83 75 
Gae....-. 109 21-0 to 62-2 28.9 387-49 42.6 89 56 85 90 57 65 














190 JOURNAL OF EXPERIMENTAL EDUCATION 


between these two (Table 16). On the other 
hand, the younger group did reliably better 
than either the middle or older group on the 
bead chain (Test 2). Test 5, picture compar- 
ison, gave reliably different results in the 
percentage passing, the older group being 
31% and 26% below the young and middle 
groups respectively. 

Although both the younger and the middle 
groups had less difficulty than the older group 
with the mazes (Test 6), in neither case was 
the difference completely reliable, the critical 
ratios being 2.06 and 1.63 for the differences 
between the younger and older groups and 
between the middle and older groups 
respectively. 


g. Year VII 


The tests for the VII year level, arranged 
in order of difficulty, are given in Table 17. 
Tests in this group show a wide variation in 
relative difficulty, since the percentage pass- 
ing ranges from 82% to 33% for the Mental 
Age group and from 72% to 26% for the 
Total group. Opposite analogies seems to 
offer the most difficulty at this level, for only 
26% of the 735 individuals to whom this test 
was given were able to pass it. 


Of the various tests at this year level 
(Table 18), Test 1, picture absurdities, is the 
only one which shows a reliably different re- 


[Vol. 10, No. 3 


sult for the age groups: 23% more of the 
younger group than of the older were able to 
pass this test. 

Although the older group did better than 
either the middle or the younger on the two 
similarities item (Test 2), the differences 
were not completely reliable, the critical 
ratios being 2.40 and 2.55 respectively. 


10. Year VIII 


All of the differences in relative difficulty 
of the tests at the VIII year level (Table 19) 
can be considered as statistically reliable 
with one exception: in the Mental Age 
group verbal absurdities and similarities and 
differences are not reliably different from 
each other. The order of difficulty is the 
same for both groups, vocabulary being the 
easiest and memory for sentences III the 
most difficult test at this year level. 

The older patients (Table 20) perform de- 
cidedly better on the vocabulary (Test 1) 
than the younger; the difference is 27%, 
critical ratio 4.75. On the Wet Fall item 
(Test 2), however, only 42% of the older 
group as compared with 80% of the younger 
were able to pass. On this test the perform- 
ance of the older group was not only reliably 
poorer than that of the younger, but reliably 
poorer than that of the middle group as well. 

A higher percentage of the younger exam- 
inees than of the older ones passed on simi- 


TABLE 17 
YEAR VII: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 


MENTAL AGE GROUP TOTAL GROUP 
N=141 N =785 
TEST % Reliably TEST % Reliably 
Pass- iff. from . from 
No. Name ing Test No No. Name ing Test No. 
1 Pict. absurd. I__._.- 82 2, 3, 5, 6 4 Compre. III_.._.._. 72 1, 2, 3, 5, 6 
4 Compre. III___-__-- 75 2,5 _ epee 58 1, 2, 4, 5, 6 
— 9° ae 74 1, 2, & 1 - Pict. absur. I___._-. 57 2, 3, 4, 5, 6 
6 Five digits_..._____- 69 1, 2, & 6 Five digits__.._.___- 52 1, 2, 8, 4, 5 
= “see... .......-. 57 1, 3, 4, 5, 6 = Seeeeee......... 46 1, 3, 4, 5, 6 
5 Opp. anal. I______-- 33 1, 2, 3, 4, 6 > Gee aes. .<..... 26 1, 2, 3, 4, 6 
TABLE 18 
YEAR VII: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 6-6 Through 8-5 
Intelligence 

Chronological Age Quotient PERCENTAGE PASSING TEST 

Group N Range Ave. Range Ave. 1 2 3 4 5 6 
Younger... 95 9-5 to 15-8 12.7 45-78 67.6 88 49 73 65 83 62 
Middle__.. 95 15-9 to 22-3 18.2 48-56 49.1 16 48 71 68 33 62 
Older-_--_. -- 95 22-4 to 66-0 80.6 48-56 49.7 65 66 73 17 34 69 




















March, 1942] STANFORD-BINET TEST ITEMS 191 


TABLE 19 
YEAR VIII: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 
MENTAL AGE GROUP TOTAL GROUP 
N=113 N=717 
TEST % Reliably TEST % Reliabl 
Pass- diff. from diff. from 
No. Name ing Test No. No. Name ing Test No. 
1 Vocabulary......... 92 2, 3, 4, 5, 6 1 Vocabulary_.......- 60 2, 3, 4, 5, 6 
S See oy ......... 16 1, 2, 3, 4, 6 5 Compre. IV_......_- 55 1, 2, 3, 4, 6 
_— 4a 65 1, 3, 4, 5, 6 5 Wel gee......-.... 47 1, 3, 4, 5, 6 
3 Verb. absurd. I...... 64 1, 2, 5, 6 3 Verb. absurd. I_____- 44 1, 2, 4, 5,6 
cr -_ « = ae 53 1, 2, 5, 6 = > Spee 1, 2, 8, 5, 6 
6 Sent. mem. III_..--- 89 1, 2, 3, 4, 5 6 Sent. mem. III_._.-- 87 1, 2, 3, 4, 5 
TABLE 20 


YEAR VIII: PERCENTAGE PASSING EAC 


H TEST FOR THE THREE AGE GROUPS 


M. A. Range: 7-6 Through 9-5 


Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 2 3 4 5 6 
Yomger- inte 9-0 to 16-6 13.6 51-78 60.0 70 80 55 64 76 45 
Middle__.. 74 16-7 to 23-7 19.0 50-63 655.3 88 72 50 59 69 35 
Older...... 74 28-10 to81-10 33.0 50-62 65.4 97 42 49 42 78 56 


larities and differences (Test 4) also, there 
being 99 chances in 100 that the difference, 
22%, is a true difference. 


On memory for sentences III (Test 6) the 
middle group rated lower than either the 
younger or the older group. Although the dif- 
ference is not completely reliable, more of 
the older group than of the younger group 
passed this test. 





11. Year IX 


At the IX year level (Table 21), 46% of 
the Total group could solve the making 
change test as compared with only 26% who 
were able to solve ver: al absurdities II. The 
table shows the tests arranged in order of 
difficulty, the percentage passing each test, 
and the serial number of each test from which 
its difficulty value is reliably different. 


TABLE 21 
Year IX: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 
MENTAL AGE GROUP TOTAL GROUP 
N =84 N =682 

TEST % Reliably TEST % Reliably 

Pass- diff. from Pass- diff. from 

No Name ing Test No No. Name ing Test No. 

5 Making change--_--- 82 1, 2, 3, 4, 6 6 Making change-_---- 46 1, 2, 4, 6 
6 Four digits rev._.. _- 66 1, 2, 4, 5 3 ha 45 1, 2, 4, 6 
3 et ae 1, 2,4, 5 6 Four digits rev.____- 41 1, 2, 8, 4, 5 
1 Paper cutting I-___-_- 52 2, 3, 5, 6 1 Paper cutting I___-_- 39 2, 3, 4, 5, 6 
oo 0 0—Si(‘t Rn 50 8, 5, 6 ce =e 1, 2, 3, 5, 6 
2 Verb. absurd. II .... 42 1, 3, 5, 6 2 Verb. absurd. II..._- 26 1, 3, 4, 5, 6 
TABLE 22 
Year IX: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 8-6 Through 10-5 
Intelligence 

Chronological Age Quotient PERCENTAGE PASSING TEST 

Group N Range Ave. Range Ave. 1 2 8 4 5 6 
Younger... 55 11-5 to 16-7 14.5 57-79 65.6 64 44 76 53 65 64 
Middle. _.. 54 16-9 to 22-9 19.2 57-69 63.9 59 48 66 57 76 56 
Older. .... 64 22-11 to81-10 34.1 57-69 62.0 87 33 48 43 76 68 











192 JOURNAL OF EXPERIMENTAL EDUCATION 


In comparing the influence of life age upon 
test performance at this level (Table 22), we 
find that designs (Test 3) is the only test 
having a truly reliable difference, 28% more 
of the younger than of the older group being 
able to pass the test item. 


Although the difference is not completely 
reliable, the older group did less well on the 
paper cutting I (Test 1) than either the 
middle or younger group, the critical ratios 
being 2.35 and 2.93 respectively. 


12. Year X 


At the X year level the order of difficulty 
for the test items is identical for the Mental 
Age group and the Total group. Table 23 
shows the tests of this level listed in order 
of difficulty. 


Picture absurdities II (Test 2) shows a 
reliable difference in favor of the younger 
group, the percentage passing being 43% 
higher for this group than for the older group 
(Table 24). Vocabulary (Test 1) again 
appears to be easier for the older group than 
for the younger, and, although the critical 
ratio of 2.86 shows that this difference is not 
completely reliable, there are, nevertheless, 
99 chances in 100 that this difference, 23%, 
is a true difference. 


[Vol. 10, No. 3 


13. Year XI 


Table 25 shows the data for the various 
tests on the XI year level. For the Total 
group the percentage passing ranges from 
54% for problem situation to 18% for verbal 
absurdities III. 


None of the test items show a clear and 
reliable change with life age. Although 30% 
more of the middle than of the older group 
were able to pass the verbal absurdities III 
(Test 2), with the younger group between 
the two, the differences are not reliable. 


Age seems to act as a slight handicap for 
three similarities (Test 6) also. On this item 
the middle group again received the highest 
rating, 61% of the middle group and 41% 
of the younger group, as compared with only 
28% of the older group, being able to pass 
the test. 


D. SUMMARY AND CONCLUSIONS 


The test performances on the 1937 Revi- 
sion of the Stanford-Binet, Form L, of one 
thousand patients at a state institution for 
mental defectives were analyzed to determine 
the relative difficulty of test items at year 
levels from Year II through Year XI. Only 
patients who had an intelligence quotient of 
less than 80 were included in this study. 


TABLD 23 
YEAR X: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 


MENTAL me GROUP 7 eal 

TEST % Reliably TEST % Reliably 

Pass- diff. from Pass- iff. from 

No. Name ing Test No. No. Name ing Test No 
1 Vocabulary__...___- 77 2, 3, 4, 6 1 Vocabulary. --__....- 45 2, 3, 4, 6 
5 Word naming---- ._- 67 3,4 5 Word naming-----_-- 44 2, 3, 4, 6 
= - “Sega 1, 3, 4 _ * “eee 1, 3, 4, 5 
2 Pict. absurd. II ____- 63 1, 3,4 2 Pict. absurd. II__-__- 42 1, 3, 4, 5 
4x9 1, 2, 5, 6 = £4} eee 1, 2, 6, 6 
“a  * ee 48 1, 2, 5, 6 S Be Excise onsen 27 1, 2, 5, 6 

TABLE 24 


YEAR X: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 9-6 Through 11-5 


Intelligence 
Chronological Age Quotient PERCENTAGE PASSING TEST 
Group N Range Ave. Range Ave. 1 2 3 4 5 6 
Younger... 48 12-8 to 17-9 16.3 638-79 170.8 67 83 42 48 69 62 


Middle.... 47 17-11 to21-11 19.1 63-76 
Older------ 48 22-1 to 46-9 27.8 63-76 











69.8 17 60 62 55 64 55 
69.1 90 40 46 52 62 67 





























March, 1942] 


STANFORD-BINET TEST ITEMS 


193 





TABLE 25 


Year XI: RELATIVE DIFFICULTY AND RELIABILITY OF DIFFERENCES 


MENTAL AGE GROUP TOTAL GROUP 
vest -\~” %  Reliabh vest —% _ Reliably 
iably 

5 diff. from Pass- diff. from 

No. Name ing Test No. No. Name ing Test No. 
4 Sent. mem. IV_---_- 17 1, 2, 6 5 Problem sit.__.._.__ 54 1, 2, 3, 4, 6 
3 eae 75 1, 2,6 4 Sent. mem. IV____-- 33 1, 2, 3, 5, 6 
5 Problem sit..______- 65 1,6 8 Abstr. wds. I... _-.- 30 1, 2, 4, 5, 6 

2 Verb. absurd. III_.._ 6538 3,4 — ae 6 2, 3, 4, 5 
6 Threesimil.......... 49 8, 4, 5 6 ES 2, 3, 4, 5 
Rr) I haddkadena 43 3, 4, 5 2 Verb. absurd. III.... 18 1, 38, 4, 5, 6 
TABLE 26 
YEAR XI: PERCENTAGE PASSING EACH TEST FOR THE THREE AGE GROUPS 
M. A. Range: 10-6 Through 11-10 
Intelligence 

Chronological Age Quotient PERCENTAGE PASSING TEST 

Group N Range Ave. Range Ave. 1 2 3 4 5 6 
Younger. -_. 29 140to19-5 16.8 70-79 73.9 41 652 69 65 62 41 
Middle.--- 28 19-6to23-8 21.0 70-78 74.1 388 61 78 71 64 += 61 
Older__.__- 29 23-9to73-5 31.0 70-79 74.1 381 381 72 69 72 # 28 


For this group the data show considerable 
and reliable differences in difficulty among 
the tests within each year level. If these 
tests are re-arranged according to relative 
difficulty, the new order will show variations 
from the order in which the test items are 
arranged in the standardized examination. 
Certain patterns of difficulty become evident 
upon inspection of the tables: for these 
patients tests like verbal absurdities, sentence 
memories, reasons, and completing the pic- 
ture of a man offer greater difficulties than 
other items such as the vocabulary and com- 
prehension groups of tests. 


The same data were also studied to deter- 
mine the influence of life age upon test per- 
formance. Analysis shows that the younger 
groups give a reliably or nearly reliably bet- 
ter performance than the older groups of 
comparable mental ability on tests involving 
pointing or actual manual performance on 
the part of the examinee; that is, on tests 
such as identifying parts of body, picture 
comparison, commands, bead chains, mazes, 
designs, and paper cutting, as well as on tests 
involving a humorous situation, as, for ex- 
ample, picture absurdities and Wet Fall. 


The older groups, on the other hand, found 
less difficulty than the younger groups of like 


mental ability on tests of vocabulary, defini- 
tions, picture identification, and test items 
involving comprehension. Performance on 
these more verbal tests apparently is affected 
to a greater than ordinary degree by life age 
and by school and general experiences. 


Our data seem to indicate that for subjects 
of low intelligence, certain types of tests tend 
to be consistently more difficult than others, 
and, as a result, definite patterns of relative 
difficulty of the tests within the various year 
levels appear. A complete and qualitative 
evaluation of test performance, therefore, 
must necessarily include a consideration of 
the relative difficulty values of the tests 
within each year group, particularly those 
tests passed or failed at the upper and lower 
limits of the standard testing range. 


From our study we may also conclude that 
increased life age and experience appear to 
have a differential influence upon perform- 
ance on certain test items, and therefore, in 
psychometric work with individuals of low 
intelligence, at least, an understanding of the 
relative effect of chronological age will also 
be valuable whenever a qualitative interpre- 
tation of test data is desired. 





194 


REFERENCES 


1. Gillette, Annette L. “Relative Difficulty 


of Tests Within Each Year Level of 
Revised Stanford-Binet, Form L, Years 
Six Through Twelve,” Journal of Psychol- 
ogy, XII (1941), 125-138. 

. Growdon, C. H. “The Revised Stanford— 
Binet Scale Applied as a Point Scale,” 
Journal of Applied Psychology, XXV 
(1941), 660-671. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. ro, No. 3 


3. Martinson, Betty, and Strauss, Alfred A. 


“A Method of Clinical Evaluation of the 
Responses to the Stanford—Binet Intelli- 
gence Test,” American Journal of Mental 
Deficiency, XLVI (1941), 48-59. 


. Strauss, A. A., and Werner, H. “Qualita- 


tive Analysis of the Stanford-Binet Test,” 
American Journal of Mental Deficiency, 
XLV (1940), 50-55. 








